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Foreword 



This volume contains the invited papers, contributed papers, and poster sum- 
maries accepted for the Fifth International Conference on Artificial Intelligence 
and Symbolic Computation (AISC 2000) . The conference was held in Spain from 
17 to 19 July 2000 at the Hotel NH Zurbano, Madrid, and was organized by the 
Universidad Complutense de Madrid and the Sociedad Matematica Puig Adam. 

One of the reasons for centralizing all activities at one hotel was to avoid both 
losing time in transportation around Madrid and the distribution of the atten- 
dees into disconnected subgroups in diverse locations. In this way a breakfast- 
to-late-night coexistence (which included some extra-academic events) was en- 
sured, with time for formal and informal conversations. This continued the 
AISMC/AISC tradition of the creation of a friendly atmosphere, where ideas 
could be exchanged in a relaxed and effective way. 

The conference belongs to a specialized conference series founded by John 
Campbell and Jacques Calmet with the initial title “Artificial Intelligence and 
Symbolic Mathematical Computation” (AISMC). AISMC-1 took place in 1992 in 
Karlsruhe (Germany); AISMC-2 was held in 1994 at King’s College (Cambridge, 
UK), and AISMC-3 in 1996 was located in Steyr (Austria). The proceedings of 
these conferences were published in Springer’s LNCS series as volumes 737, 958, 
and 1138, respectively. 

The Steering Committee then decided to drop the word “Mathematical” from 
the name of the conference series (and the “M” in the acronym) to emphasize 
that the conference was not only related to Mathematics but to all aspects of 
symbolic computation. Therefore, the proceedings from that time onwards have 
been transferred to Springer’s LNAI series from LNCS. Our first conference after 
that decision, AISC’98, took place in Plattsburgh (NY, USA) during 1998. Its 
papers appeared as volume 1476 of the LNAI series. 

The next conference in the same field, AISC 2002, will be held in Nice (France). 

The field includes a wide range of activities, such as Automated Theorem 
Proving, Logical Reasoning, Mathematical Modeling of Multi-agent Systems, Ex- 
pert Systems and Machine Learning, Engineering, and Industrial Applications. 
Despite this breadth of coverage, the program committee has (as in previous 
conferences) kept the number of accepted papers low, following a strict referee- 
ing process, to avoid any necessity for parallel sessions and to allow longer than 
usual presentations and periods for questions and discussion. A poster session 
was included in 2000 for the first time in an AISC conference; short accounts of 
some of the research covered in posters is included here. 

In some of the past AISMC/AISC volumes and in papers in related journals 
such as the Annals of Mathematics and Artificial Intelligence, forecasts have 
been made about the areas of likely and promising future research in topics 
that the conferences cover. As the saying goes, “it is always difficult to predict, 
especially to predict the future” ; so, it is not embarrassing to observe that some 
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Foreword 



of the predictions have still to match what is actually happening. But others, 
such as the explicit representation (for computer-based use) of mathematical 
knowledge, are now emerging, as the present volume shows. The fact that this 
issue of representation has been encouraged indirectly by the expansion of the 
World-Wide Web and more directly by the existence of HTML and various 
subsequent ....MLs was not something that was predicted - but such unexpected 
synergies show that the AISC area is not only alive and well, but is still capable 
of generating pleasant scientific surprises. 

The book begins with the papers from the three invited speakers. The papers 
that follow have been grouped so that the topics of successive items are as close 
together as possible. 

We acknowledge gratefully the generous sponsors of AISC 2000: the Universi- 
dad Complutense de Madrid (through different sources: the “Convocatoria 1999 
para la Organizacion de Reuniones, Congresos y Seminaries”, its Department of 
Algebra and its “Servicios Informaticos” ) , the Spanish software sales representa- 
tive “Addlink Software Cientifico” , the companies Texas Instruments and Logic 
Programming Associates (LPA), the “Real Sociedad Matematica Espahola”, and 
the “Sociedad Matematica Puig Adam” . 

We also express our warm thanks to the members of the Steering Committee 
and Scientific Committee for refereeing contributed papers and their most valu- 
able help in making AISC 2000 a success. We would like to thank additionally the 
members of the Local Committee, who faced all the “behind the curtains” work, 
and especially Professor Roanes-Macias for taking care of all the unpleasant and 
endless economic details. Finally we put on record our thanks to the director 
of the corresponding BBVA bank office (Mr. Miguel Santos) for his kindness 
and efficiency; the travel agency “Viajes Eurobusiness” for their good work and 
for trusting the local organizers of the conference by not asking for the usual 
imposing financial guarantees in advance, and the management and staff of the 
Hotel NH Zurbano for the facilities provided for the conference. 
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John Campbell 
Eugenio Roanes-Lozano 
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George Boole, a Forerunner of Symbolic 
Computation* 



Luis M. Laita^, Luis de Ledesma^, Eugenio Roanes-Lozano^, and 
Alberto Brunori^ 

^ Universidad Politecnica de Madrid, Dept. Artificial Intelligence, Campus de 
Montegancedo, Boadilla del Monte, 28660-Madrid, Spain 
^ Universidad Complutense de Madrid, Dept. Algebra, Edificio “Almudena”, 
c/ Rector Royo Villanova s/n, 28040-Madrid, Spain 



Abstract. We examine in this invited presentation Boole’s principles of 
logic and his method of performing inferences. The principles of Boole’s 
logic are based on the application of an early symbolic calculus known 
in his time as the “method of separation of symbols” . His logic’s in- 
ference procedures are symbolic operations allowed inside this method. 
Such inference procedures are reinterpreted and generalized using com- 
puter algebra. The lecture also presents a short biography of Boole and 
a description of some of the factors that had an influence on the genesis 
of his logic. 



1 Introduction 

George Boole is recognized as one of the precursors of mathematical logic. Nev- 
ertheless, some more insight on the genesis of his logic leads one to think that 
Boole was also a forerunner of important developments in symbolic computa- 
tion. He, clearly, could not use computers, but he suggested methods that can be 
translated to interesting modern computer algebra results. Section 3 deals with 
the influence of Boole’s own work in a symbolic method called “the method of 
separation of symbols” on the making of his first book on logic. The Mathe- 
matical Analysis of Logic [10] (to be hereinafter denoted as MAL). A computer 
algebra translation of Boole’s inference procedures is provided. This section is 
an outline of the more elaborated arguments we have presented in [36] . 

A relevant part of this lecture (section 2) is dedicated to present a condensed 
biography of Boole which includes a short account of two items that deal with in- 
fluences on the genesis of his logic: the controversy held between the philosopher 
William Hamilton and the mathematician Augustus De Morgan and a curious 
outlook of Boole that can be extracted from the examination of the writings of 
his wife Mary Everest (biographies of Mary Everest are [14] and [15]). These two 
items are mentioned to stress how factors, external and objective and internal 
and subjective respectively, influence scientific creation. An excellent biography 
of Boole is [39]; this and other biographies of Boole, such as [31] and [17], are 

* Partially supported by project DGES PB96-0098-C04 (Spain). 
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based, mainly but not only, on the one written by Harley shortly after Boole died 
[27] and on the account of Boole’s life given by Mary Everest in “Home Side of 
a Scientific Mind” (to be cited hereinafter as “HS”) [18]. The controversy and 
Everest’s views have been described in detail in [33] and [34] . 

2 Aspects of Boole’s Life 

George Boole was born in Lincoln, England, on the 2nd of November, 1815. 

Because of the poverty of his family, his formal education was minimal. His 
fellow students considered him to be something of a genius ([27], p. 428). 

By the age of twelve, George’s interests moved from the elementary science 
taught to him by his father to languages. This early training in languages had 
its share among the influences which led to the construction of Boole’s logic: he 
built his logic in the same way he felt languages were built. 

Several biographers of Boole and Mary Everest tell about his desire when 
he was about fourteen years old to enter the ministry of the Anglican Ghurch. 
Guriously, the only one who does not mention this is precisely his first and most 
reliable biographer, Harley. In a way it seems that Harley tried to avoid all 
references to Boole’s religious attitudes. Nevertheless he quoted in a footnote a 
letter that Boole wrote to him from London early in 1864. The letter is revealing 
in regard to Boole’s religious feelings, if compared with the accounts of this issue 
given later by Mary Everest: 

(...) I have just returned from hearing Maurice. To say that I was pleased 
is to say nothing (...). But I should not express my real feeling if I said 
less than that I listened to him with a sense of awe. (...) I feel with you 
that I should not like to leave the Church while Maurice is in it ( [27], 
460 ). 

Maurice was a preacher of a kind of Ghristian-socialist theory, who was very 
much admired by Boole. The last statement of Boole’s letter implies that he had 
doubts about his continuing in the Ghurch, or that at least he questioned some 
of her teachings. 

Whether or not Boole had thought of an ecclesiastical career, it is known 
that he did not carry out his ideas. Instead he became a teacher, successively in 
Doncaster, Waddington, and Lincoln. 

Boole was a successful teacher. His evenings were spent in the study of math- 
ematics. According to Mary Everest (HS, p. 6) and other biographers, Newton, 
Lagrange, Laplace, Dirichlet, Jacobi and Gauchy were studied very thoroughly 
by him without any other help than his own will. They say that the works of 
these mathematicians were available at the Mechanics Institute of Lincoln, an 
institution founded by a local squire. MacHale says that Boole had rather started 
his mathematical studies with Lacroix’ book “Differential and Integral Galculus” 
and he later regretted it. In any case, Boole’s first publications in The Gambridge 
Mathematical Journal (see next section) testify that he had mastered Lagrange 
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and Laplace while he was very young, no matter whether he had studied these 
authors or not in the Mechanical Institute. 

In 1835 he gave an address in the Mechanics Institute “On the Genius and 
Discoveries of Sir Isaac Newton” [3]. One reads in Boole’s address: 

There was yet another disadvantage attaching to the whole of Newton’s 
physical inquiries, (...) the want of an appropriate notation for express- 
ing the conditions of a dynamical problem, and the general principles 
by which its solution must be obtained. By the labours of LaGrange, the 
motions of a disturbed planet are reduced with all their complication and 
variety to a purely mathematical question. It then ceases to be a physical 
problem; the disturbed and disturbing planet are alike vanished; the ideas 
of time and force are at an end; the very elements of the orbit have dis- 
appeared, or only exist as arbitrary characters in a mathematical formula 
([3], p. 6). 

This quotation shows that Boole, already at the early age of twenty, had 
grasped the ideas which were at the base of his whole methodology: first, that 
a good symbolism was a necessary tool for the advancement of mathematical 
knowledge and secondly, that mathematical manipulation of symbols could be 
separated from interpretation at the intermediate steps of proofs. 

Boole made a trip in 1839 to Cambridge, where he contacted the Scottish 
mathematician Duncan F. Gregory, founder of the Cambridge Mathematical 
Journal. In 1841 Gregory published two of Boole’s papers (see section 3). In 1844, 
his paper “On a General Method in Analysis” was published in the Transactions 
of the Royal Society of London, and awarded the Royal Medal. He maintained 
periodic contact with mathematicians. Gregory died in 1844, but Boole had met 
the mathematician and logician Augustus De Morgan in 1842. From that time 
on they had a most cordial relationship. 

In 1846 a controversy about a logical issue (the quantification of predicates) 
arose between De Morgan and the Scottish philosopher Sir William Hamilton. 
Boole became very interested in it and decided to work out a system of his own; 
the result was MAL. According to De Morgan, this book appeared in public the 
same day that he published his Formal Logic. 

William Hamilton (Hamilton’s logical ideas can be found in [40]) had an 
astonishing erudition referred to many branches of knowledge. Nevertheless, he 
had a curious dislike of mathematics. In 1836 he published a paper [26] in re- 
sponse to another one written by William Whewell [45], which dealt with the 
importance of mathematics in a liberal education. 

Hamilton’s paper is worthy of careful consideration because some of the ideas 
that appeared in it were reflected in the Introduction to MAL. The paper attacks 
the opinion that the study of mathematics is important in a liberal education. 
Mathematicians themselves, in Hamilton’s opinion, were not able to reach an 
agreement in regard to the value of mathematics as a gymnastic of the mind. 
Some of them held the opinion that analysis does not constitute such a gymnastic 
because it transports the student mechanically to the conclusions, whereas the 
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ancient geometrical constructions led to the end with a clear consciousness of 
every step in the procedure. Others, on the contrary, held the view that the 
methods of geometry are tedious. As a result, they recommended the algebraic 
methods as the most favourable to the powers of generalization. 

After having assigned to mathematics a limited place inside logic, Hamilton 
proceeded to make the following assertions. 

(a) It is wholly beyond the domain of mathematics to inquire into the origins 
and nature of its own principles. 

(b) Mathematics does not say anything about necessary truths, but rather about 
necessary inferences. 

(c) The stress on such one-sided disciplines as mathematics produces a dispro- 
portionate development of one power at the expense of others. 

(d) No other discipline tends to cultivate a smaller number of faculties, in a more 
partial or feeble manner, than mathematics. 

To support these assertions, Hamilton gave a display of erudition, citing au- 
thor after author from antiquity to his own time, who had held opinions similar to 
his. One reads in Hamilton’s paper that “mathematics are only difficult because 
they are too easy”, so that no pleasure is found when studying mathematics. 
Hamilton ended his paper with a bitter criticism of the plan of studies followed 
at Cambridge University at that time. 

Boole explicitly mentioned Hamilton’s paper in several places of MAL (MAL, 
pp. 11-14 and p. 81). A footnote in the Introduction to this book shows that 
Boole had considered Hamilton’s arguments very carefully, even to the point of 
discovering a mistake in one of Hamilton’s quotations (MAL, p. 12). 

When discussing Hamilton’s paper, Boole implied that Hamilton’s arguments 
were incorrect because they were based on the opinion that philosophy deals with 
causes while sciences deal with the investigation of laws. 

Boole’s argument was that if the search for causes is a task that does not 
transcend the limits of the human intellect, and that if the nature of philosophy 
is that search, then logic forms no part of philosophy. It is at this point that 
Boole made the statement which lies at the base of his whole system of logic: 
that logic should not be associated with philosophy but with mathematics. 

Regarding Hamilton’s statement that to inquire into the origin and nature 
of its own principles is beyond the domain of mathematics, Boole wrote that if 
this is so, then the same should be stated of logic. But for him, as for Hamilton, 
logic “not only constructs a science but also inquires into the nature and origins 
of its own principles” (MAL, p. 12). Thus his conclusion was that mathematics 
also has the power and right to inquire into its origins and nature. 

Boole had also discussed, in the Introduction of the book, the issue of the 
relevance of symbols in scientific expositions. His aim was probably to correct 
Hamilton’s opinion that the symbolization of mathematics could lead to the 
destruction of the reflective powers of students of that discipline. He contended 
that if symbols are used with full understanding of the laws which render them 
useful, “an intellectual discipline of high order is provided” (MAL, p. 9). 
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The reference that Boole made in the Introduction to MAL to the pleasure 
that the spirit finds in the mathematical study of both nature and mind may 
have also been intended to correct Hamilton’s idea that mathematics is difficult 
because it is too easy, meaning that no pleasure is afforded by the study of it. 

Hamilton’s paper had the indirect influence of clarifying (by opposition) 
Boole’s ideas about such points as the relevance of symbols and the relative 
roles of logic and mathematics. Boole very probably had definite opinions about 
these issues before he knew of Hamilton’s arguments, but it is almost certain 
that he stated his opinions explicitly because of his desire to correct Hamilton’s 
views. 

Some references in MAL to De Morgan’s logical ideas (MAL, pp. 41 and 82) 
show that Boole knew De Morgan’s logic well. Such a knowledge being granted, 
can any kind of influence of De Morgan’s ideas upon Boole’s be traced to it? 

The comparison between MAL and De Morgan’s “Syllogism” [16] makes 
it clear that Boole’s logic was a totally different construct from De Morgan’s. 
Nevertheless, deep coincidences regarding methodological principles are noticed. 
The principles in question are: (a) the crucial importance given to ordinary 
language as guiding the construction of logic; (b) the possibility of a relevant 
improvement of logic by means of its mathematization; and (c) the principle 
of the existence of a universe of discourse embodying terms by pairs, each pair 
being composed of two opposite elements a and not-a. 

As it can be seen, the influences of Hamilton’s and De Morgan’s logical con- 
ceptions on Boole were of an indirect nature, acting mainly as clarifications or 
confirmations of Boole’s already formed opinions. Especially important was the 
influence on Boole (by contrast) of Hamilton’s article on the value of mathemat- 
ics in education. 

In 1849 Boole was appointed professor of mathematics at Queen’s College of 
Cork, Ireland. It seems that De Morgan was instrumental in this appointment. 
In 1855 Boole married Mary Everest. 

George and Mary had five daughters from their marriage, all of whom were 
later to display special abilities. For instance Lucy, the fourth, became the first 
female professor of chemistry in England. 

Mary Everest, suggested at several places in her writings, collected in [13], 
that a psychological theory of knowledge with religious implications was at the 
base of both her husband’s logic-mathematical discoveries and his attitudes to- 
wards life (HS, p. 40, [19,20,22]). We examine next very shortly Mary Everest’s 
claims; as the reader will see, most of these claims seem at least exaggerated. Nev- 
ertheless one gets, after considering them, a sensation that they reflect, thought 
it is difficult to say to what extent, some of Boole’s real feelings and ideology 
that may have had their influence on the genesis of his logic. 

Mary Everest’s “Boole’s method” reduces to the view that the human mind 
always faced - in any problem -, pairs of opposite facts, opinions, theories, and 
so on related to it which had to be weighed and contrasted in order to achieve 
a synthesis into a superior unity which embodied those opposites. The success 
of the process of successive unifications was based on the fact that God, being 
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One, attracts the human mind which in that way feels an instinctive impulse 
towards Monism. As a matter of curiosity Mary Everest says that this process 
was discussed by Boole with “a learned Jew” ([20], pp. 951-952). 

What and how was Boole’s theory of knowledge according Mary Everest?. 
The best way to describe it is to quote her version of such a theory: 

The mind of a man is encased in a mechanism which, besides receiving 
impressions through what we call senses, receives information also from 
some source, invisible and undefinable, access to which opens whenever 
the mind, after a period of tension on the difference, contrast or conflict 
between any elements of thought, turns to contemplate the same elements 
as united or as forming parts of unity ([22], p. 795). 

In particular, regarding man’s psychology, she says; 

But he seemed to assume, as the first of salutary facts, that there is direct 
contact between the Divine Magnetism and the nervous system of man 
([22], p. 795). 

Boole’s logic, based on the fundamental equation = x, was then an ex- 
pression of that philosophy, such an equation being formed by the two opposite 
elements (1 — x) and x, the sum of which gives 1, the universe of discourse [21]. 

Regarding Boole’s religious beliefs, according to Mary Everest’s testimony, 
Boole was close to Unitarianism. Being convinced that God was the only impor- 
tant matter, he considered particular religious creeds as sources of divisions (HS, 
p. 3). Thus the true intellectual, and the true religious man, should be impartial 
(HS, p. 43). One reads in HS that Boole’s impartiality made him become, by 
unanimous acclamation, “a referee in all parties” (HS, p. 24). 

No biographer of Boole has taken into consideration the claims that Mary 
Everest made in regard to the psychological and religious origins and aims of 
Boolean logic; some have even implied that her judgment was unsound [38,28]; 
see also ([20], p. 955), where Mary Everest tells about her lack of success in 
trying to convince scientific people about the existence of a religious message in 
Boole’s logic). Nevertheless there are some arguments that would support the 
opinion that Mary Everest was basically reflecting the truth, although, as we 
have speculated above, in a way that was quite exaggerated or distorted. 

One argument is inferred from the internal coherence which exists between 
HS, which has been recognized as reliable by all biographers of Boole, and the 
rest of Mary Everest’s writings, especially in those points which referred to what 
she called “Boole’s method” . Some others are provided by the study of several of 
Boole’s own writings, by the consideration of issues such as Boole’s community 
of opinions with other intellectuals he was in contact with, his ideas as a young 
man expressed in his address on Newton and by the study of the very basic 
ideas underlying the construction of his logic as presented in MAL. For reasons 
of space we deal very succinctly with only the last two of these items, referring 
the reader to [34] for a more elaborated argument. 
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In his address, Boole said of Newton: 

It is generally supposed that his attention was directed to this subject by 
observing the falling of an apple. If this tradition be correct, it strangly 
teaches us what effects may arise from trivial or common occurrences, 
when the latent energies of nature or of mind are thereby roused into ac- 
tion. The falling of an apple was an every-day occurrence, yet its moral 
consequences have been to all human appearance, greater than the down- 
fall of an empire. It had touched upon some hidden spring, -some sleeping 
and folded energy: a train of thought was excited, which, though inter- 
rupted, was never abandoned, until the foundation was laid of the great 
science of Physical Astronomy ([3], p. 10). 

Thus Boole, already in 1835, believed in “latent energies of the mind” and 
“folded energies”, these concepts resembling the concept of that “source invis- 
ible and undefinable” to which “Boole’s method” refers. Moreover, the address 
contains many references to the history of ancient philosophies, by which Boole 
illustrates the idea that existence of error proved the existence of truth. 

But there is more; for Boole cited Zoroaster. Does not this fact imply that he 
had become interested since the times he was a young man in dualistic philoso- 
phies ([3], pp. 21-22)? That Boole knew and had meditated on ancient thoughts 
is also inferred from a quotation in MAL (MAL, p. 49), and from the testimony 
of some of his biographers ([27], p. 428 and Taylor, one of Boole’s grandsons [44], 
p. 47). 

Regarding MAL, it ought to be recognized that the idea of reaching unities 
from the contemplation of opposites is repeatedly used by Boole in important 
parts of his book (MAL: pp. 40, 49-50, 52, 64, 65, 77). 

Let us transcribe one of these paragraphs as illustration. 

Consider what are those distinct and mutually exclusive cases of which 
it is implied in the statement of the given Proposition, that some one of 
them is true, and equate the sum of their elective expressions to unity. 

This will give the equation of the given Proposition. (MAL, p. 52). 

Summarizing, it seems that some psychological and religious ideas contributed 
to the genesis of Boole‘s logic. Probably they were not as influential in this gene- 
sis as Mary Everest suggested, but there are arguments to suppose they existed. 

At one time (1860), the possibility of Boole being nominated Professor of 
Mathematics at Oxford almost became a reality. But he sent only his name to 
be included in the list of candidates and the post was assigned to another man. 
Thus he spent the rest of his short life as a professor in Cork. He was famous 
for his knowledge, kindness, and total lack of egoism. 

As has been noted above, in 1864 Boole made a trip to London. On his 
return he was almost completely exhausted. One day in November of that year 
he walked from his house to the College under heavy rain, and lectured in wet 
clothes. The result was a bad cold which terminated his life a few days later, on 
December 8, 1864. 
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3 Boole, Forerunner of Symbolic Computation 

3.1 The Method of Separation of Symbols 

The philosophy underlying the method of separation of symbols (to be denoted 
hereinafter as “mss”) consisted of separating symbols of operation from their 
subjects of application and operating with the former as with algebraic entities 

The mss had been suggested by several French and British mathematicians 
working in the second half of the 18th and the first half of the 19th century. It 
seems to us that Duncan F. Gregory was the one who most clearly stated how the 
method works. But it was Boole who took the mss to its ultimate consequences: 
in particular, Boole’s logic was one of the branches of mathematics suggested by 
the mss. 

The historical development of the method has been exhaustively studied by 
Koppelman [30], Knobloch [29], Panteki [41] and Grattan-Guinness [23] as part 
of the history of symbolic calculi. We have also studied it in [32], [36] and [37]. 
In this section we refer just to Gregory because this is enough to determine the 
immediate background of the influence of the mss on the genesis of Boole’s logic. 

Two of Gregory’s papers are examined next to determine how the method 
works. 

The first article under consideration appeared in 1838 [24]. Gregory deter- 
mines in this paper the symbolic laws used in Newton’s binomial scheme. Gregory 
notes that Euler used only the following laws of combination of symbols in the 
general application of the binomial development. 

— Gommutative law: ab = ba . 

— Distributive law: c(a + b) = ca + cb . 

— Index law: a™a” = . 



Gregory states that since it can be proved that the operations of differential 
calculus and of the calculus of finite differences are subject to those laws, it can 
be assumed that the Newton’s binomial development is valid for such calculi, 
which means that it is not necessary to repeat the proof for each particular 
case. Let us consider an example of application taken from Gregory’s article: the 
determination of the nth derivative of a product of functions u ■ v, • v) can 
be written: 



d , , dv 



du , d' 
v- = {- 



dx dx dx 



du., , 



where ^ is an operation upon v but not upon u, and ^ is an operation upon 
u but not upon v. 

So, the n-th differential may be considered as a power of just the expression 
in parentheses with no attention paid to u and v. Gregory says that the result 
is also valid when n is fractional or negative. 

In the article [25], Gregory proposes a characterization of symbolic algebra, 
a characterization suggested to Gregory by his considerations on the mss. 
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... it is the science which treats of the combination of operations defined 
not by their nature, that is, by what they are or what they do, but by 
the laws of combination to which they are subject. And as many different 
kinds of operations may be included in a class defined in the manner I 
have mentioned, whatever can be proved of the class generally, is neces- 
sarily true of all the operations included under it ([25], p. 208). 

Regarding the reasons for accepting those laws of combination, Gregory says: 

It is true that these laws have been in many cases suggested (as Mr. Pea- 
cock has aptly termed it) by the laws of the known operations of number; 
but the step which is taken from arithmetical to symbolical algebra is, 
that, leaving out the view of the nature of the operations which the sym- 
bols we use represent, we suppose the existence of classes of unknown 
operations subject to the same laws ([25], p. 208). 

As Pycior notes [42], Peacock philosophy of mathematics went farther than 
his own mathematical work. While advocating freedom in algebraic calculation, 
he was not able to let his work be free because of his need of justifying on the 
grounds furnished by known mathematics the laws of combination to which the 
symbols were submitted. Gregory shared Peacock’s idea of freedom in algebra, 
but not his needs for such a limitation. Gregory accepts that even though the 
different sets of laws which belong to symbolic algebra are suggested by known 
mathematics, there may be operations subject to the same laws that are not yet 
well known (but soundly guessed). This allows for the possibility of discovery 
and construction of different particular algebras each of them an instantiation of 
a part of symbolic algebra. Then both known mathematical operations and oth- 
ers not yet established (but soundly guessed as said above) may be divided into 
classes, in such a way that operations that obey formally identical laws belong 
to the same class. In this context, a theorem is a symbolically expressed result 
obtained by applying to the operations in a class any mathematical procedure 
which is permissible inside that class (“permissible” means any procedure valid 
for the known operations that belong to the class). Nevertheless, it is important 
to note the remark that Gregory makes: the theorems are true in a particu- 
lar branch of algebra, “provided always that the resulting combinations are all 
possible in the particular operation under consideration” ([25], p. 208). 



3.2 Boole and the mss 

In 1840, Boole sent Gregory two papers for possible publication in The Gam- 
bridge Mathematical Journal. Gregory, after suggesting some changes, published 
them in 1841 [4,5]. 

It is very likely that Gregory saw an astonishing resemblance with what he 
himself was doing, in both the underlying philosophy of Boole’s papers and par- 
ticular details. It was then that he must have informed Boole of the specific 
terms of the mss. This guess is supported by reading Boole’s third published 
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paper [6], where he explicitly refers, on page 115, to Gregory’s three laws. More- 
over, Boole suggests at the end of his paper that the method could be improved 
if new algebraic processes were found. 

Boole improved the method in his longest and most mature paper [7]. The 
following statement at the beginning of the paper is of interest to what we will 
go on to say in the next section. 

Mr Gregory lays down the fundamental principle of the method in these 
words “there are a number of theorems in ordinary algebra, which, though 
apparently proved to be true only for symbols representing numbers, admit 
of a much more extended application”. Such theorems depend only on the 
laws of combination to which the symbols are subject, and are therefore 
true for all symbols, whatever their nature may be, which are subject to 
the same laws of combination. The laws of combination which have been 
hitherto recognized are the following, p and r being symbols of operation 
and u and v subjects. 1. The commutative law, whose expression is pru = 
rpu 2. The distributive law, p{u+v) = pu+pv 3. The index law, p™p"'u = 
p™~^^u. Perhaps it may be worth while to consider whether the third law 
does not rather express a necessity of notation, arising from the use of 
general indices, than any property of the symbol ([7J, p. 225). 

Boole mentions Gregory’s three laws, but adds that these are the ones recog- 
nized until now, implying that there may be others (see for instance a paper of 
1846 [8] and another of 1847 [9] where he presents quite complex symbolic laws 
to find a solution for Laplace’s equation) . 

Next we shall see, first, that the first principles of the logic as they appear at 
the beginning of his MAL are an almost direct transcription of laws suggested in 
the method of separation of symbols, and, second, that the inference procedure 
consists of applying developments of functions in the equations which transcribe 
the premises on which such inference is based. 

Boole writes at the beginning of his first treatise on logic: 

Further, let us conceive of a class of symbols x, y, z possessed of the fol- 
lowing character. The symbol x, operating upon any subject comprehend- 
ing individuals or classes, shall be supposed to select from that subject 
all the Xs which it contains. (...) When no subject is expressed, we shall 
suppose 1 (the Universe) to be the subject understood, so that we shall 
have: x = a;(l), the meaning of either term being the selection from the 
Universe of all the Xs which it contains and the result of the opera- 
tion being in common language, the class X, i.e. the class of which each 
member is an X . (...) 

1st. The result of an act of election is independent of the grouping or classi- 
fication of the subject... 

x{u -\- v) = xu -\- XV 
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2nd. It is indifferent in what order two successive acts of election are per- 
formed. (...) 



xy = yx 



3rd. The result of a given act of election performed twice, or any number of 
times in succession, is the result of the same act performed once. (...) 



The third law (x'^ = x) we shall denominate the index law. It is peculiar to 
elective symbols and will be found of great importance in enabling us to reduce 
our results to forms meet for interpretation (MAT, pp. 15-18). 

Logic turns out to be a calculus governed by the same laws as some of those 
in the method of separation of symbols. 

From the examination of Gregory’s and Boole’s work, we may infer that the 
mss worked as follows. First, symbolic algebra classifies known calculi accord- 
ing to the laws of their combinations. Then one proceeds to examine a new 
piece of knowledge: if it obeys formally identical laws of combination of symbols 
such as those of a known class of calculus, it is placed inside that same class. 
Thereafter, all the mathematical processes that are permissible inside this class 
lead to theorems in the new theory (provided that the resulting theorems are 
interpretable). 

Boole had the intuition that logic was a piece of knowledge candidate for 
becoming a part of symbolic algebra. Basing his statement about the laws of 
combination of the logical symbols on his own study of mental processes, he 
found that these laws were precisely the distributive, commutative, and index 
laws. 

Choosing algebraic processes, especially equation systems resolution and, cu- 
riously, MacLaurin series expansions (MAL, p. 70) as tools for producing proofs 
was not a mere coincidence, since these processes were allowed inside the class 
of calculi based on the same three mentioned laws. 

3.3 A CA Approach to Boole’s Inference Procedures 

This subsection reexamines Boole’s ideas on inference from today’s computer 
algebra (CA) point of view as follows. 

First, Boole’s use of MacLaurin series expansions in the translation of logical 
formulae into polynomials can be justified and extended using a CA System. 
Second, his method of inference in “hypotheticals” (propositional calculus) , can 
be related directly with a polynomial ideal membership. Third, the final results 
of his method of inference in “categoricals” (a part of monadic first order logic), 
also based on MacLaurin series expansions, can be emulated using CA too (if 
an appropriate interpretation of Boole’s logical symbols is made) . 

A Maple implementation for the first item and a CoCoA one for the second 
are used. For reasons of space we do not deal with the third item. This choosing 
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of two different languages, Maple and CoCoA, is due only to the fact that each 
of them has some advantages over the other in the two calculations are needed 
here. CoCoA is particularlly effective when calculating Grobner bases. 



Logical Statements as Polynomials. Boole presented the following polyno- 
mial translation of the basic formulae of logic in chapter 5 (pp. 48-59) of MAL: 

— Not X: (1 - a;) . 

— X and Y: xy . 

— X or Y (not exclusive): x + y — xy . 

— X or Y (exclusive): x + y — 2xy . 

— If X then Y: a;(l — y) = 0, that is, \ — x + xy = 1, giving the polynomial 

1 — x + xy . 

These translations follow directly from a proper use of the equalities (1) 
and (2) below, which appear in chapter 6 of MAL. Curiously indeed, Boole 
reached these equalities by basing his argument on MacLaurin series expansion 
of functions, so far away from the standard mathematical bases of today’s logic. 

The series expansion of an elective function of just one elective symbol x, 
gives (MAL, p. 61): 

f{x) = /(O) -f f{Q)x + + - 

that, under the condition x = x^ leads to: 

f{x) = f{0) + ax (1) 

and a can be calculated (by hand or using a Maple program) from (1): 

a = /(l)-/(0) (2) 

Similarly, the series expansion of a function in two elective symbols gives an 
expression of the form (MAL, p. 62): 

f{x,y) = f{0,0) + ax + f3y + Sxy (3) 

Once the values of a, j3, and 6 have been found (by hand or using a computer), 
it is straightforward to check that Boole’s translations for “not”, “or”, “and” 
and “implies” follow from his MacLaurin’s expansions of elective functions as 
advanced at the beginning of this section. 

Boole did not introduce truth values explicitly, but these were implicit in the 
expressions of elective functions. For instance the polynomial translation 1 — x 
of “Not(X)” follows directly from (1) and (2) if /(I) = 0 and /(O) = 1. Similarly 
the polynomial translation x + y — xyoi “X or Y” follows from (3) if /(O, 0) = 0, 
/( 0 , 1 ) = /( 1 , 0 ) = /( 1 , 1 ) = 1 . 

Boole’s use of MacLaurin series expansions can also be applied to many- 
valued logic. This is a result that he could not imagine. Let us see how. 
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Fig. 1. Truth tables for Kleene’s three-valued logic 



For a function in just one elective symbol, under the condition = x, one 
obtains the following MacLaurin series expansion: 



f{x) = m+x-{f{o) 



/'"(O) 



+ ...)+x^-( 



no) , f"'(o) 



1 • 2 • 3 

that is, an expression of the form: 

f(x) = /(O) + ax + (3x^ ■ 



1-2 



1-2-3-4 



+ ...) 



For a function in two elective symbols and under the same condition x^ = x, 
one obtains the expression: 



f(x, y) = /(O, Q) + ax + !3y + 6xy + ex^ + rjy^ + Ox^y + ixy^ + nx^y^ 



As above, a, P, S, e, rj, 9, i., and n can be calculated by a Maple program. For 
instance: 

a = 2/(l)-^/(0)-i/(2) 

As an illustration we refer to Kleene’s three-valued logic. Its truth tables values 
can be found in figure 1 (0, 1 and 2 respectively mean “false”, “indeterminate” 
and “true”). 

Let us denote the function f{x) for negation as f^{x). As f-,{0) = 2, /^(l) = 
1, f-,{2) = 0, by applying the values for a,P,6... obtained like above, one gets: 
f^{x) = 2-x and similarly for fy{x,y), fA{x,y), and f^{x,y). 



Modern Interpretation of Boole’s Ideas. Boole made explicit as basic laws 
of combination of symbols, only the commutative, distributive, and his special 
index law x^ = x. But he also used products, sums, and implicitly, opposite 
elements for the sum (x + (—a:)) = x^ — x = 0. Thus, he was implicitly working 
in a polynomial quotient ring, actually the ring A = Q[x, y, z, ...., w]/I, being I 
the ideal I = <x^ —x,y'^ — y, z"^ — z, ...., uP — w> . The ideal I expresses Boole’s 
index law. 

^2 (which can be extended to ^p, being p a prime number) plays in our CA 
approach the role of Q. This is not an essential change from Boole’s approach 
because he, even though allowing 2, 3, etc. as coefficients (for example in his 
exclusive “or” translated as x + y — 2xy) required the final formulae to take only 
the values 0 and 1. These final values of the polynomial expression of Boole’s 
basic logical statements remain unchanged if performing the following changes: 
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-Exchange Q[x, y, z, w]// by Z 2 [x^y, z, I . 

- Exclusive “or” : exchange x + y — 2xy by x + y . 

- Inclusive “or” : exchange x + y — xy by x + y + xy . 

- “Implies” : exchange 1 — x + xy by 1 + x + xy . 

MacLaurin series expansion of functions and Boole’s insight justified his 
translations of statements into polynomials. This can be done by a simple process 
of determining the coefficients of a polynomial in Z 2 [xi, a; 2 , Xn] /I, under to- 
day’s knowledge of truth tables. The process is a little complex, so the reader is 
referred to [35] for details. 

In particular, for Kleene’s three- valued logic, the translation into (classes of) 
polynomials in ^ = Z 3 /I, I =<x^ — x, y^ — y, — z, ...., — w>, is: 

- /-(?) = (2-g) +/ 

- /v(y, r) = + q^r + qr"^ + 2qr + q + r) + I 

- f/\ {q, r) = { 2 q^r‘^ + 2 q^r + 2gr^ -|- gr) -|- / 

- /^(g, r) = (g^r^ -I- g^r -|- gr^ -I- 2g -|- 2) -|- / 

These polynomials are the same as would have resulted by applying Boole’s 
MacLaurin series. 



Boole’s Inference Methodology. Boole says in MAL, (MAL, p. 55): “The 
treatment of every form of hypothetical Syllogism will consist in forming the 
equations of the premises, and eliminating the symbol or symbols which are found 
in more than one of them. The result will express the conclusion. ”. Let us see 
an example (MAL p. 56, 5th example): 

— If X is true, Y is true: x(l — y) = 0 . 

— If W is true, Z is true: w(l — z) = 0 . 

— Either X is true, or W is true: x + w — xw = 1 . 

From these equations, eliminating w we have: x + y — yz = 1, which expresses 
the conclusion, in Boole’s words, “Either Y is true, or Z is true, the members 
being non-exclusive” . 

Boole calls “elimination” the following process; given: 

ay -I- 6 = 0 
a'y -h 6' = 0 



multiply the second equation by a and the first by a' , and perform the subtrac- 
tion, obtaining ab' — a'b = 0. 

Note that the negation (Boole negates an expression by making it equal to 
0) of the conclusion results to be an algebraic combination of the negation of the 
premises, a fact which will be of utmost importance in the extension of Boole’s 
idea, next. 

Such a “Boolean” idea can be extended to the following theorem, which is 
stated without proof (see again [35]). The theorem refers to any p-valued logic, 
being p a prime number (and p — 1 the truth value corresponding to “true”). 
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Definition 1. A propositional formula Aq is a tautological consequence of the 
propositional formulae Ai, A 2 , Am, denoted {Ai,A 2 , ■■■, Am\ H ^o> if and 
only if for any truth-valuation v such that v{Ai) = ^(^ 2 ) = = v{Am) = p — 1> 

then f (Aq) = p — 1. 



Theorem 1. A formula Ag is a (tautological) consequence of other formulae 
Ai,..., Am, if and only if the polynomial that translates the negation of Aq be- 
longs to the ideal generated by the polynomials that translate the negations of 
Ai, Am, and the polynomials x\ — x\,x^ — X 2 ,---,xl) — Xn- 

This theorem has been proved quite recently, independently of Boole’s sug- 
gestions (see Alonso et al. [2], Chazarain et al. [12], Roanes-Lozano et al. [43] 
and Laita et al. [35]). What is claimed here is that the theorem both validates 
Boole’s approach to inference in hypothetical and is an almost natural exten- 
sion of this approach. The proof in [35] uses a quotient ring with respect to the 
ideal I, that corresponds to Boole’s introduction of the law x'^ = x). 

In computer algebra, the way to check if a polynomial belongs to an ideal is 
finding if the Normal Form {NF) of the polynomial, modulo the ideal, is 0 (see 
for instance [1]). 

The theorem can be applied to the study of consistency as follows. 

A set of propositional formulae {A 1 ,^ 2 , ...,Am} is inconsistent if 

{Al, A 2 , ..., Am} \= A 

A being any formula of the language in which Ai, A 2 , ..., Am are written. This is 
expressed in terms of ideals by considering the ideal J generated by the negations 
of Al, A 2 , ..., Am is the whole ring (i.e. 1 G J -\- 1), which means that any formula 
A is (tautological) consequence of Ax,..., Am- That 1 belongs to an ideal is 
expressed in computer algebra by stating that the Grobner Basis of the ideal is 
{ 1 }- 

In the remainder of the section the computer algebra language CoCoA is 
used [11]. 

We consider now one of Boole’s examples for hypotheticals (5th example, 
MAL, p. 56): this is the example presented as illustration in subsection 3.3. 

i) Declare the ring of polynomials and the ideal I (the “elective symbols” x, y, 
z, w, are respectively denoted as X [1] , X [2] , X [3] , X [4] ). 

A: := Q([x[l. .4]] ; USE A; 

I : =Ideal (x [1] ~2-x [1] ,x [2] "2-x [2] ,x [3] ~2-x [3] ,x [4] "2-x [4] ) ; 

ii) Polynomial translation of Boole’s basic logical statements (see Subsection 
3.3). 

NEG(M) :=NF(1-M, I); 

DR1(M,N) :=NF(M+N-M*N, I); 

DR2(M,N) :=NF(M+N-2*M*N, I); 

AND1(M,N) :=NF(M*N, I); 

IMP(M,N) :=NF(1-M+M*N, I); 
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iii) As explained above, Q can be exchanged by 2Zi (then changing —1 by +1 
and -2 by 1 in NEC, ORl, 0R2 and IMP. 

A::=Z/(2)[x[l],...,x[4]] 

iv) HI, H2, H3 and Cl respectively denote the hypotheses 1, 2 and 3 and the 
conclusion 

Hl:=IMP(x[l] ,x[2]); 

H2:=IMP(x[4] ,x[3] ) ; 

H3:=0Rl(x[l] ,x[4]); 

Cl:=DRl(x[2] ,x[3]); 

v) Declare the ideal J generated by the (negations of the) hypotheses: 
J:=Ideal(NEG(Hl) ,NEG(H2) ,NEG(H3)) ; 

vi) Does the negation of the conclusion belong to the ideal J generated by the 
negations of the premises?. The answer is YES if the following normal form is 0. 

NF(NEG(NEG(C1) , J+I) ; 

vii) CoCoA gives 0 as output, as expected. 

The same argument can be applied to all other examples in MAL, (MAL, 
pp. 55-59). As a matter of curiosity, CoCoA finds typographical errors in Boole’s 
examples 6 and 7. 

4 Conclusions 

Boole’s logic was born as a branch of a general symbolic calculus known in his 
time as the “Method of Separation of Symbols” . The laws of logic, according to 
Boole’s intuitions, were no more, no less, that the symbolic expressions of the 
laws of thought. All mathematical operations allowed inside the class of calculi 
known to be based on the same laws of symbols he found for logic, regardless their 
quite differentiated fields of application, were also allowed in logic. In particular, 
Boole’s logic inference procedures were symbolic manipulations based on series 
expansions and equation systems solving. 

Such a conception of logic led Boole to consider propositional logic state- 
ments as polynomials and to produce inferences by showing that (the polyno- 
mial expression of) consequences were algebraic combinations of (the polynomial 
expressions of) premises. 

In this way, Boole can be considered as a forerunner of both modern symbolic 
calculus in general and of interesting Computer Algebra approaches lo Artificial 
Intelligence in particular. 

Boole’s intuitions that premises and conclusions in logic can be represented 
as polynomials and that conclusions are found by taking algebraic combinations 
of premises, advanced to apply Grobner bases to extraction of consequences 
in logical systems in general and to verification and extraction of kowledge in 
rule-based expert systems in AI in particular as follows. 

If the Grobner basis of the ideal generated by the polynomials that translate 
the (negation of the) rules and facts of a rule-based expert system is {!}, then 
the ideal is the whole ring, so theorem 2 says that any formula is a tautological 
consequence of the information contained in the expert system. It means that 
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the expert system leads to inconsistency, so it has to be corrected. Once incon- 
sistencies have been suppressed, we can ask, by using Normal Forms, whether 
or not a given formula, formed in the language in which the expert system has 
been built, is a tautological consequence of the information contained in the sys- 
tem. In [35] we have described the application of this method to verification and 
extraction of consequences in expert systems based on three-valued logics and 
containing 50 variables. The verification of this expert system takes two minutes. 
With just a Pentium-based PC with 128 Mb RAM, expert systems containing 
150 variables under three- valued logic can be verified, the computing time being 
about fifteen minutes. 
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Aspects of short-range planning dominate the strategic decision-making process 
in management. The capability of managers to carry out their own functions 
effectively tends to be reduced by the increasing complexity and pressure of 
this job. Intelligent management systems (IMS) are needed to improve the 
decision-making process in management. Artificial intelligence and neural 
networks are very well matched to this need. The decision-making process is 
described in some detail, to illustrate what kinds of IMS functionalities are 
required, and thus to present the problem to specialists in artificial intelligence. 



The Challenges of a Modern Management Process 

Enterprises are more and more confronted with the consequences of dynamic change 
caused by the globalization of commerce and by the information flood that is due to 
the Internet. The radical change of the environment in which companies operate 
reduces the ability of managers to act on the strategic aspects of their task, and to 
conduct effective management of operations, in a timely way. Catching and 
structuring the complexity of the area that is being managed therefore tends to happen 
less, or less effectively. 

Existing management systems are not in a position to give optimal treatment to the 
daily data streams, or (therefore) to prepare the best management decisions. The 
evaluation of chances and risks becomes more difficult day by day. Thus, in practice, 
the percentage of risky or emotional decisions will be increased, and the strategic 
decision-making process will be dominated by the pressing considerations of short- 
range planning. 

The current trend of mergers and acquisitions shows one reason for the pressure of 
short-term considerations: the beliefs of shareholders about what constitutes value. In 
this respect, the idea of being the biggest bank in the world, and actual success as 
measured on stock exchanges, was a primary factor for the recent attempted merger 
between Deutsche Bank and Dresdner Bank. The key facts and milestones for any 
long procedure for achieving a functional merger between businesses received 
relatively little attention. A precise analysis showing the fitting of the different 
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cultures of the two banks was missing. There was no detailed implementation 
planning. There was no worst-case scenario and no emergency plan. In consequence, 
the deal failed. 

This example underlines the actual situation of complex value chains, insufficient 
information, deficient evaluation of key facts, deficient research, inadequate tests, and 
decisions that were at best merely incorrect. If intelligent managemennt systems 
(IMS) had been available for identifying relevant data, checking the available relevant 
data, and providing a best available evaluation in time, some of these difficulties 
could have been reduced, or at least pointed out so that managers could give them 
proper consideration. Moreover, to be useful in a given management environment, 
IMS must include components for learning from the data and cases available. 
Producing good IMS is still a challenge, despite past achievements in some particular 
areas. 

In order for the challenge to be appreciated in detail, it is necessary to explain the 
managerial decision-making process (Knoppe, M., Strategische Allianzen in der 
Kreditwirtschaft , Oldenbourg Verlag, p. 18-23, M nchen 1997). 
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Fig. 1. Phases and activities of a modern decision making process 
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The Phases of a Modern Decision Making Process 

Figure 1 shows the different phases of modern management. This management is an 
ongoing problem-solving process. To show the complexity of the activity, it helps to 
split the management process into different phases. Each phase has its own demands 
with respect to IMS, and consideration of the different phases side by side allows one 
to start forming impressions about what an appropriate computer-based architecture 
for IMS should be. 

The modem management process begins with a scanning phase. Scanning detects a 
possible opportunity, threat, variation from a norm, or disturbance. Problem discovery 
then defines the problems that have been uncovered by the scanning. Diagnosis calls 
for more detailed information about the problems. Discovery and diagnosis determine 
the direction and location of search. Search and innovation produce redefinitions of 
the problems, changes in level of aspiration, and reinterpretations of what consititutes 
an ideal solution. Search and innovation provide what is to be evaluated and chosen. 
Evaluation and choices narrow the range of possibilities for what will be sought. 
Search is conducted to justify what has already been chosen tentatively as a solution. 
Evaluation and choice must be authorized before being implemented. Rejected 
authorization or failed implementation forces re-evaluation, redesign or redefinition. 
Problem diagnosis determines evaluation and choice. Search is then eventually 
eliminated. The solutions to the problem are given by the diagnosis. The results of 
evaluation and choice modify the diagnosis and raise new problems. Implementation 
experience changes the focus of scanning and thereby improves the whole 
management process. "Controlling" checks whether or not the decisions and their 
consequences are still acceptable, fit the time constraints, and represent best practice. 
It is evident from their purpose that redefinition and scanning are a never-ending 
story. 

Phases 8 and 9 especially - controlling and redefinition - set up a modem 
management process. This process is characterized by permanent learning activities, 
directed towards strategic issues and also towards tactics and management of 
operations. Good learning activities guarantee a management's quality and its 
competence to survive in a complex business world. Learning is not confined to the 
management; it also refers to any management systems that are in place. A modem 
IMS must be expected to integrate whatever learning processes are needed to fulfil the 
management's demands (Kirsch, W., Die Handhabung von Entscheidungs- 
problemen , Barbara Kirsch Verlag, p. 180 191, M nchen 1988). 

By comparison with management processes in the past (which were without the 
presence of the Internet and high-tech systems), modem management and its 
decision-making are dominated by a habitual lack of time. The element of time is the 
first consideration for the profit of a management system. Therefore, an IMS has to 
combine relevant time-dependent components and components that deal with the 
learning process(es). 
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What Is a Management System? 

A management system is in essence an additional organization (see Figure 2). These 
additional organizations overlay the actual business systems and management 
structures (basic organization). The staff has to undertake tasks derived from the basic 
organization and the management systems. Each task demands a special management 
system such as information systens, planning systems and "controlling" systems. 
According to its function, any single management system is part of either a strategic 
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Management System 
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Fig. 2. Management systems as additional organizations 
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or operational architecture of management systems (see Figure 3). Strategic 
management systems provide information and support for the long-range planning 
process. "Operative" management systems are defined as those that support the short- 
term decision-making process. The differentiation between steategic and operative 
management systems shows that different features are required in the design and 
implementation of the two types of system. 

The point of a management system is to reduce the apparent complexity of an 
enterprise. To reach this target, the overall system of a company has to be split into a 
number of different subsystems (modules). Each module has a place in the 
architecture shown in Figure 3. The modules there have different tasks, and not every 
company needs to use all those modules. The modules are exchangeable, and should 
be adaptable to special needs. Among the modules there are different relationships 
which represent personal, organizational, technical and social connections (Kirsch, 
W., Managementsysteme , Barbara Kirsch Verlag, p. 128-139, M nchen 1989) 

In the past, management systems had no integrated learning process. They 
therefore did not evolve internally; all aspects of evolution remained outside the 
systems. Today management systems are routinely computer-based, and to meet the 
challenges of modem management processes, internal evolution is highly desirable. 
Something like this view turned up in the 1960s, accompanying the building of the 
first computer-supported management systems. Broadly speaking, these failed, 
because appropriate hardware and software technology were not available at the time. 
During that period, Ackhoff (Ackhoff, R.L., Management Misinformation System , 
in: Management Science, H. 4, p. 147 156, 1967) even talked about "management 
misinformation systems". But during the last 10-15 years, hardware and software 
technology have developed so rapidly that they now offer the chance to realise the 
performance spectrum necessary for an intelligent management system. 

To survive in the new world of globalization, management needs qualified and 
detailed information for planning and controlling an organization's daily activities and 
for arriving at well-checked decisions quickly. To handle the challenges of a 
company's environment and compensate for the effects of rapidly-changing markets, 
an IMS must collect, filter, store and evaluate the information, must categorize data, 
and take advantage of any opportunity to improve any aspect of the management 
process. Furthermore, an IMS should be able to formulate scenarios, show 
alternatives, propose solutions, and deliver arguments to underpin the decisions that 
they propose. In order to do this, it should be capable of storing its experiences, and of 
adapting past experiences and past decisions to new situations. This means that it 
must be able to learn. In this respect it can provide a kind of organizational memory, 
as a way of making up for the fact that most companies feel the lack of well-trained 
staff and/or staff with long experience of the organization and the reasons for its 
particular procedures and decisions. We may never be able to say that an IMS is an 
intelligent as a managerial brain, but if we can achieve similar levels of performance 
in some knowledge-intensive areas of management, then the human managers will be 
helped by the IMS to check, evaluate and commit in a timely way to decisions that are 
good even from the strategic point of view. 
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Dynamic change offers no time for human management to evaluate all the details 
of decision parameters carefully. While an IMS should not be expected to take over 
the job of a manager, who should continue to make the final evaluation and the 
selection of a best decision, the IMS can deal with the initial steps, generation of 
alternatives etc., and can also reduce the risk of reaching incorrect decisions. 

Figure 4 shows the interfaces between the different phases of a decision process, 
IMSs, computer systems and managers (Alex Bj m, K nstliche neuronale Netze in 
Management-Informationssystemen , Grundlagen und Einsatzm glichkeiten, Gabler 
Verlag, p. 74-76, Wiesbaden 1998). 
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Applicability of Artificial Intelligence as a Management Tool 

The human brain has the competence to interpret complex signals within 
milliseconds, and to make sense of new situations of a given general type by reference 
to knowledge about old situations of the same general type. This kind of activity is 
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what a good manager does. Where artificial intelligence (AI) has analogues of this 
behaviour, its techniques are likely to be of the greatest value within IMS 
implementations. 

The closest analogue in AI is in connectionism, and in the use of neural nets in 
particular. Compared to a traditional managemnent system, a neurally-based IMS has 
the advantage that not much knowledge about the internal structure of a problem class 
is needed, but just knowledge about what it takes to specify problems in the class. 
Samples (problem examples, with solutions) are adequate to train the neural networks, 
so that they can be said to have learned about the associations between problems and 
solutions, and can exploit this knowledge in use on problems that arrive subsequently. 
As stated above, learning is a key to the future acceptability of IMS. Classical 
(symbolic) learning techniques in AI are not irrelevant, but it is likely that neural and 
other subsymbolic methods will match better the actual needs of IMS users. For 
example, the users (who are not AI specialists) will want to treat the learning 
components of an IMS as black boxes, while symbolic learning methods appear to 
demand that the user should know something about the details of what goes on inside 
the box. 

The most immediately evident applications of neural networks are in the three 
phases of diagnosis, evaluation and controlling. In fact, applications of this kind are 
quite numerous already in commerce, particularly in the financial sector (Goonatilake 
S. and Treleaven P. (eds.). Intelligent Systems for Finance and Business, John Wiley 
& Sons, Chichester, 1995). For example, banks use neural networks for assessing the 
creditworthiness of companies and individuals (the contribution by D. Leigh, "Neural 
Networks for Credit Scoring" at p. 61-69 in the book quoted above). Traders on stock 
exchanges use neural networks to classify shares according to their potentialities and 
risks (contribution by Refenes A.N., Zapranis A.D., Connor J.T. and Bunn D.W., 
"Neural Networks in Investment Management", p. 177-208). Neural networks also 
support market analysis in various industries, e.g. direct-mailing, (Furness P., "Neural 
Networks for Data-Driven Marketing", p. 73-96) food products (coffee), toiletries 
(shampoo). Other applications include assessing the development of market shares, 
and optimizing solutions to problems in production processes. These examples cover 
all of the three phases of managerial activity mentioned above. While they are not 
specifically "managerial", each activity is of direct value for some managerial 
process, and strong analogues of the kinds of computation involved are relevant for 
IMS applications. For instance, an analogue of the creditworthiness check would be a 
check of the credibility of arguments/data in favour of a business merger. The aim of 
a diagnosis phase here should be to filter similar features of past mergers. By 
comparing different features, and matching problem situations and risks and outcomes 
of similar mergers, neural networks may be able to make interesting suggestions 
about the problem of a merger currently under consideration. Because of the 
heterogeneity of the data about this problem, and the likelihood that the volume or 
quality of data will not permit safe conclusions to be drawn from statistical methods, 
neural networks should be able to come up with better results. 

It can be said that the examples like those above have validated the use of neural 
networks for IMS applications in the future. 
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In principle there are no phases in Figure 1 that subsymbolic AI cannot handle 
(with neural networks, though more recent techniques such as genetic algorithms are 
also demonstrating their worth where the underlying problem-solving activity is 
search rather than the classification or association that occurs when neural networks 
are used (J. Koza, "Genetic Programming for Economic Modeling", p. 251-269, same 
book). Genetic programming is also useful more widely, e.g. for operations involving 
design and/or uncertainty (Smith A.E. and Norman B.A., "Evolutionary Design of 
Facilities Considering Production Uncertainty", in I. Parmee (ed.). Adaptive 
Computing in Design and Manufacture 2000, p. 175-186, Springer- Verlag, Berlin, 
2000). And there are methods located somewhere between subsymbolic and 
traditional symbolic AI which need to be understood better and applied more in 
managerial areas. Bayesian belief networks are a good example; one of several 
existing applications with direct interest to management (assessing dependability of 
"systems") has been reported by Neil. Littlewood and Fenton (Neil M., Littlewood B. 
and Fenton N., "Applying Bayesian Belief Networks to Systems Dependability 
Assessment", Proceedings of Safety Critical Systems Club Symposium, Leeds, p. 71- 
93. Springer- Verlag, Berlin, 1996) Moreover, there are situations where symbolic AI 
still has some relevance. Even if methods based explicitly on mathematical logic 
happen to be too slow or unwieldy for the time-limited needs of an IMS, more 
flexible symbolic methods such as case-based reasoning (Kolodner, J., Case-Based 
Reasoning , Morgan Kaufman, San Mateo, 1993) can be considered; they are 
intended to deal with just the kinds of activity mentioned near the end of the previous 
section. Furthermore, it is possible that some applications, particularly quantitative 
ones (as in production optimisation, and treatment of business-oriented econometric 
data), some hybrid treatment involving a mixture of symbolic and subsymbolic 
methods will be advisable. There are many open problems and open areas of 
applicability, with plenty of data; it is a challenge for an AI audience to produce new 
results and new tools to assist in IMS development and application. 

With respect to neural networks at least. Figure 5 shows some conceivable 
applications within an IMS. 

Some of these applications are not purely activities that can be reduced to making 
associations or classifications with respect to past and current examples of problems. 
It may therefore appear that traditional neural-network schemes are not adequate to 
express all the contents of the applications. But there are interesting and relatively 
new extensions of the traditional approach, which could benefit from exposure to such 
applications - and vice versa. In the present book, there are examples of interesting 
extensions in the contributions by J. Pfalzgraf and by A. Iglesias and A. Galvez. 

A further challenge for suppliers of neural networks - and, indeed, any tool derived 
from AI - is that the usefulness of a technique to management, and especially top 
management, depends positively on how little the users are expected to know about 
its theoretical side (or, in other words, about the inside of the black box). The 
approach to educating users about a technique has to be easy, and the time needed for 
the approach has to be short. Neural networks respect these considerations, which is 
why they are very attractive as IMS components. 
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Fig. 5. The field of neural networks as a tool of a management system 



Knowledge about methods of neural networks needs to be kept separate from the 
knowledge of users about the decision-making and the commercial process that is 
being managed. The user should not be required to know anything substantial about 
the former. Figure 6 illustrates a design for the functional behaviour of an IMS, 
involving neural components, which respects these considerations. 

This has been a look at the area of IMS from a user's perspective, identifying needs 
and some possible ways for AI specialists to fill those needs. It has not considered 
explicitly mathematical knowledge or reasoning, though there are many areas of 
management practice where dealing with large amounts of quantitative information 
implies the effective use of mathematical knowledge. Integrating the special 
perspective and skills of scientists working on "artificial intelligence and symbolic 
computation" with the problems of the IMS area is a general challenge which contains 
many particular problems. The AI community can support practice by developing 
theoretical models and methods adapted to the methods and demands of 
economically-based management decision-making. To end with just one example, 
general methods for the plausible explanation and justification of the outputs from 
neural networks would obviously have substantial immediate value, but we are still in 
need of them (Alex Bj rn, K nstliche neuronale Netze in Management- 
Informationssystemen, Grundlagen und Einsatzm glichkeiten , Gabler Verlag, 
Wiesbaden 1998). 
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Fig. 6. A general design of a management process software based on neural networks 
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Abstract. In this paper we present an extension OMDoc to the Open- 
Math standard that allows the representation of the semantics and struc- 
ture of various kinds of mathematical documents, including articles, text- 
books, interactive books, courses. It can serve as the content language 
for agent communication of mathematical services on a mathematical 
software bus. 



1 Introduction 

It is plausible to expect that the way we do (conceive, develop, communicate 
about, and publish) mathematics will change considerably in the next ten years. 
The Internet plays an ever-increasing role in our everyday life, and most of 
the mathematical activities will be supported by mathematical software sys- 
tems (we will call them mathematical services) connected by a commonly ac- 
cepted distribution architecture, which we will call the mathematical software 
bus. We have argued for the need of such an architecture in [SHS98, FHJ+99], 
and we have in the meantime gained experiences with the Math Web system 
that provides a general distribution architecture (see [FK99b]); other groups 
have conducted similar experiments [DCN+00, AZOO] based on other implemen- 
tation technologies, but with the same vision of creating a world wide web of 
cooperating mathematical services. In order to avoid fragmentation, double in- 
ventions and to foster ease of access it is necessary to define interface standards 
for MathWeb^. In [FHJ+99], we have already proposed a protocol based on the 
agent communication language Kqml [FF94] and the emerging Internet standard 
OpenMath [AvLS 96, CC98] as a content language (see Fig. 1). This layered 
architecture which refines the unspecific “application layer” of the OSI proto- 
col stack is inspired by the results from agent-oriented programming [Sho90], 
and is based on the intuition, that all agents (not only mathematical services) 
should understand the agent communication language, even if they do not under- 
stand the content language, which is used to transport the actual mathematical 

^ We will for the purposes of this paper subsume all of the implementations by the 
term MathWeb, since the communication protocols presented in this paper will 
make the constructions of bridges between the particular implementations simple, 
so that that the combined systems appear to the outside as one homogenous web. 
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Application Layer 



Fig. 1. Artificial Communication: Kqml and the OSI Reference Model 





Content Layer, e.g. OpenMath/CASL 




Performative Layer, e.g. KQML 


Presentation Layer, e.g. XML (DTD) 




Session Layer, e.g. LU6.2 


Transport Layer, e.g. TCP 


Network Layer, e.g. IP 


Link Layer, e.g. X.21 


Physical Layer, e.g. Ethernet 



content. The agent communication language is used to establish agent identity, 
reference and ~ in general - model the communication protocols (see [AKOO] for 
details in the case of mathematical services). Thus we can concentrate on the 
content language in this paper. 

The experience with Math Web in general, and with the flMEGA sys- 
tem - a mathematical assistant system based on several Math Web services 
(see [BCF+97]) - in particular have shown that it is not sufficient to be able to 
communicate mathematical objects, but also mathematical knowledge in general. 
Support for the communication of mathematical objects is already provided by 
OpenMath, which is 

[. . . ] a standard for representing mathematical objects, allowing them 
to be exchanged between computer programs, stored in databases, or 
published on the worldwide web. [. . . ] [CC98] 

This is sufficient for symbolic computation services like computer algebra sys- 
tems, which manipulate (simplify) or compute objects like equations or groups. 
Even though the logical formulae constructed or manipulated by reasoning sys- 
tems like the flMEGA system can be expressed as OpenMath objects, mathe- 
matical services like reasoners or presentation systems need more information 
e.g.: 

1. is this formula an axiom, a definition, or a theorem to be proven? 

2. what is a good strategy to proceed with the proof in this domain? 

3. is this constant basic, or defined (so that it can be expanded to a formula 
involving simpler concepts)? 

4. what is the common name of this concept (and its grammatical category)? 

Unfortunately, OpenMath fulfills this goal only partially, since it deals exclu- 
sively with the representation of the mathematical objects proper. Of course it 
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would be possible to characterize an axiom by applying a predicate “axiom” to 
a formula or using a special variant of the equality relation for definitions, but 
this would only solve item 1 above. 

This paper is concerned with the question of a communication standard for 
mathematical knowledge. We propose an extension OMDoc of the OpenMath 
standard to alleviate this perceived limitation. We will use mathematical doc- 
uments as a guiding intuition for mathematical knowledge, since almost all of 
mathematics is currently communicated in this form (publications, letters, e- 
mails, talks,. . . ). To ensure widespread applicability, we will use the term docu- 
ment in an inclusive, rather than exclusive way (including papers, letters, inter- 
active books, e-mails, talks, communication between mathematical services (see 
for instance [FK99b, FHJ+99]) on the Internet,...), claiming that all of these 
can be fitted into a common representation. Since such documents normality 
have a complex structure of their own, the specific task to be achieved in the 
extension to OpenMath is to provide a standardized infrastructure for this as 
well. As we will use the Internet standard Xml [BPSM97] (see section 2) as a 
basis for this, we can consider the syntax problem for communication in Math- 
Web as solved by the imminent wider acceptance of Xml (OpenMath is based 
on Xml and we have defined an Xml representation for Kqml in [FK99a]). 

Another piece of infrastructure which will play a role for understanding OM- 
Doc is the MBase system [FKOO, KFOO], a MathWeb service that acts as a 
distributed mathematical knowledge base system that can answer questions such 
as the ones shown above. OMDoc serves as a input output language for MBase, 
so that MBase can be used as a and as document preparation language. Thus 
the system offers a service that allows storage and (flexibly) reproduction of 
(parts of) OMDoc documents. As OMDoc can be transformed directly to e.g. 
DTfi^X, external input to MBase can be published directly. 

To evaluate the scope of OMDoc, let us look at a few possible applications. 
OMDoc can serve as 

— a communication standard between mechanized reasoning systems, e.g. the 
Clam-Hol interaction [BSBG98], or the IImega-TPS [BBS99] integration. 

— a data format that supports the controlled refinement from informal presen- 
tation to formal specification of mathematical objects and theories. Basi- 
cally, an informal textual presentation can first be marked up, by making 
its discourse structure^ explicit, and then formalizing the textually given 
mathematical knowledge in logical formulae (by adding FMP elements; see 
sections 5 and 2). 

— a basis for individualized (interactive) books. OMDoc documents can be 
generated from MBase making use of the discourse structure information 
encoded in MBase. 

— an interface for proof presentation [HF97, Fie99]: since the proof part of 
OMDoc allows small-grained interleaving of formal (FMP) and textual (CMP) 
presentations. 

^ classifying text fragments as definitions, theorems, proofs, linking text, and their 
relations; we follow the terminology from computational linguistics here. 
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These and similar applications are pursued in the Hmega project at the Saarland 
University, Saarbriicken (see http://www.ags.uni-sb.de/ omega) in coopera- 
tion with the RIACA project at Eindhoven. 

In the next section we will review the Internet standards and their architec- 
ture that are the basis before we come to the definition of OMDoc proper. 



2 Markup, Xml, OpenMath, MathMl, and OMDoc 

Mathematical (and other) texts are often written on text processors (which are 
often WYSIWYG type). Many authors consistently confuse information and doc- 
ument structure with presentation by associating formatting characteristics with 
various textual document components. Even in UTgX) one can mix structural 
markup like \chapter{Title} or 

\begin{Def inition} [Title] . . . \end{Def inition} 
with presentation markup, such as font size information, or using 

{\bf proof }:... \hfill\Box 
to indicate the extent of a proof. 

The problem with presentation markup is that it is specified for human con- 
sumption, and although it is machine-readable, the data presented in the docu- 
ment is not machine-understandable. Generally, it is very hard to automate any- 
thing for documents, when their structure is specified by presentation markup. 

With the advent of the Internet, which is quickly becoming the world’s fastest 
growing repository of mathematical documents, it is not possible to manage all 
the available knowledge manually, because of the volume of information dis- 
tributed over the Web. 

The generally accepted solution is to use logical or generic markup, i.e. to 
describe the structure of the data contained in the documents. In this markup 
scheme, the logical function of all document elements - title, section, paragraphs, 
figures, tables, bibliographic references, or mathematical equations or definitions 
- must be clearly defined in a machine-understandable way. 

This motivation has led to the development of the “Simple Generalized 
Markup Language” SGML, and more recently to the “extensible Markup Lan- 
guage” Xml [BPSM97] family of markup languages. Xml was designed as a sim- 
plified subset of SGML that can serve as a rational reconstruction of the “Hyper- 
text Markup Language” HtML [RHJ98], which carries most of the markup on 
the Internet today. From SGML, Xml inherits the concept of a “document type 
definition” (DTD), i.e. a grammar that defines the set of well-formed documents 
in a given Xml language and in particular, allows documents to be validated by 
generic tools (parsers). Moreover, presentation markup for the data specified in 
an Xml document can be flexibly generated by using the XSL style sheet mech- 
anism [Dea99]. In particular, it is possible to use more than one XSL style sheet 
for a given document to generate specialized presentations (e.g. personalized to 
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the tastes of a specific reader) of contained data using the content markup in 
the document. 

Thus the “content markup” paradigm gives improved presentation (for hu- 
man consumption) and improved machine readability at the same time. This 
has led to considerable activity in developing specialized markup schemes for 
specific application areas. (This paper is an instance of this activity). 

OpenMath is a content markup language for communicating mathematical 
objects realized as an Xml language. Its syntax (given by a DTD) and semantics 
are specified in the evolving OpenMath standard [CC98]. The central construct 
of OpenMath is that of an OpenMath object (OMOBJ), which has a tree-like 
representation made up of applications (OMA), binding structures (DMBIND using 
OMBVAR to tag the bound variables), variables (DMV) and symbols (QMS). 

Fig. 2 shows an OpenMath representation of the law of commutativity for 
addition on the reals (the logical formula Va, b.a G R A b G R ^ a + b = b + a) . 

The mathematical meaning of a symbols (that of applications and bindings is 



<0M0BJ id="commutativity-f ormula"> 

<0MBIND> 

<DMS cd="quantl" name="f orall"/> 

<0MBVAR> 

<0MV name="a"/> 

<0MV name="b"/> 

</0MBVAR> 

<0MA><0MS cd="logicl" name="implies"/> 

<0MA><0MS cd="logicl" name="and"/> 

<0MA><0MS cd="setl" name="in"/xOMV name="a"/xOMS cd="barshe" name="real"/x/OMA> 
<0MA><0MS cd="setl" name="in"/xOMV name="b"/xOMS cd="barshe" namG="rGal"/x/OMA> 
</QMA> 

<0MAxQMS cd="rGlation" namG="Gq"/> 

<0MA><0MS cd="barshG" namG="plus-rGal"/xOMV namG="a"/xOMV namG="b"/x/OMA> 
<0MA><0MS cd="barshG" namG="plus-rGal"/xOMV namG="b"/xOMV namG="a"/x/OMA> 

</0MA> 

</0MA> 

</QMBIND> 

</0M0BJ> 



Fig. 2. An OpenMath representation of Va, b.a + b = b + a. 



known from the folklore) is specified in a so-called content dictionary, which 
contain formal (FMP “formal mathematical property”) or informal (CMP “com- 
mented mathematical property”) specifications of the mathematical properties 
of the symbols. For instance, the specification 



<CDDef inition> 

<Name>plus</Naine> 

<Description>Addition on real numbers</Description> 
<CMP>Addition is coimnutative</CMP> 

<FMPX0M0BJ xref ="cominutativity-f ormula"/></FMP> 
</CDDef inition> 
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could be part of the content dictionary^ barshe . cd for elementary properties of 
real numbers (cf. section 4.2 for the relation of content dictionaries with OMDoc 
documents) . 

MathMl [IM98] is another XML-based markup scheme for mathematics. In 
contrast to OpenMath, it is more concerned with presentation markup (trying 
to reach IXTgX quality on the web) than with logical markup. Moreover, it is 
mainly concerned with the K-12 fragment of mathematics (Kindergarten to 12*^ 
grade). OpenMath is well-integrated with MathMl: 

— the basic content dictionaries of OpenMath mirror the MathMl con- 
structs, and there are converters between the two formats. 

— MathMl supports the semantics element that can be used to annotate 
MathMl presentations of mathematical objects with their OpenMath en- 
coding, and OpenMath supports the presentation attribute that can be 
used for annotating with MathMl presentation. 

— OpenMath is the designated extension mechanism for MathMl beyond 
K-12 mathematics. 

Therefore, it is not a limitation of the presentational capabilities to use Open- 
Math for marking up mathematical objects. As MathMl can be viewed 
by the WebEQ plug-in and is going to be natively supported by the pri- 
mary browsers MS Internet Explorer and Netscape Navigator in ver- 
sion 6 (see http://www.mozilla.org for Mozilla, the open source version), 
MathMl will be the primary presentation language for OMDoc. 

Since OMDoc is an extension of OpenMath, it inherits its connec- 
tions to Xml and MathMl. The structure of OMDoc documents is de- 
fined in the OMDoc document type definition DTD (cf. [KohOOb] or 
http://www.mathweb.org/ilo/omdoc, where you can also find worked exam- 
ples (including part of a mathematical textbook [BS82] and an interactive 
book [CCS99] (IDA))). 

An OMDoc document is bracketed by the Xml tags <omdoc> and </omdoc>, 
and consists of a sequence of OMDoc elements, which contain specialized rep- 
resentations for text, assertions, theories, definitions,. . . (see below). In contrast 
to markup languages like DTeX, OMDoc does not partition the documents into 
specific units like chapters, sections, paragraphs, by tags and nesting informa- 
tion, but makes these document relations explicit with omgroup elements (see 
section 7.3). This choice is motivated by the generality of the document classes 
and the fact that the relative position of OpenMath documents can be de- 
termined in the presentation phase. In particular, since OpenMath documents 
can be hypertext documents, or generated from a database, it can be impossi- 
ble to determine the structure of a document in advance, therefore we consider 

® In fact the reference <OMOBJ xref="commutativity-f ormula"/> pointing to the 
DMOBJ with the id attribute commutativity-formula uses an extension of OMDoc 
to OpenMath that allows us to represent formulae as directed acyclic graphs pre- 
venting exponential blowup. It is licensed by the OpenMath standard, since pure 
OpenMath trees can be generated automatically from it. 
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document structure information as presentation information and describe it in 
section 7.3. 

The general pattern “definition, theorem, proof” has long been considered 
paradigmatic of mathematical documents like textbooks and papers. To support 
this structure, OMDoc provides elements for mathematical items and theory 
items which we will describe in sections 4 and 5. Since proofs have a more 
complex internal structure, we will defer them to section 6. Before we come to 
these, we will describe the structure of intermediate explanatory text (section 3) . 
Finally, we will reserve section 7 for auxiliary items like exercises, applets, etc. 



3 Text Elements 

The OMDoc text elements are Xml elements that can be used to accommodate 
and classify the explanatory text parts in mathematical documents. We have two 
kinds of them: 

CMP These text elements are used for comments and describing mathematical 
properties inside other OMDoc elements. They have an xml : lang attribute 
that specifies the language they are written in; thus using groups of CMPs with 
different languages can promote OMDoc internationalization. Conforming 
with the Xml recommendation, we use the ISO 639 two-letter country codes 
(en = English, de = German, fr = French, nl = Dutch. . . ). 

CMPs may contain arbitrary text interspersed with OpenMath objects 
(OMOBJ elements) (see the OpenMath standard [CC98] for details), omlets 
(see section 7) and hyperlinks (see below). No other elements are allowed. In 
particular, presentation elements like paragraphs, emphases, itemizes,. . . are 
forbidden, since OMDoc is concerned with content markup. Generating pre- 
sentation markup from this is the duty of specialized presentation compo- 
nents, e.g. XSL style sheets, which can base their decisions on presentation 
information (see section 7.3) and the rsrelation information described in 
this section. 

ref elements are used to specify hyperlinks via the XLink/XPointer specifi- 
cation (see http://www.w3c.orgTR/{xlink/xptr}). If the reference object 
is defined in the same document, then it is sufficient to specify its id at- 
tribute in the xlink:href attribute, otherwise, it must include the relevant 
URL or xpo inter material. 

omtext OMDoc text elements can appear on the top level (inside omdoc ele- 
ments). They have an id attribute, so that they can be cross-referenced, an 
(optional) rsrelation attributes specifying the rhetorical structure relation 
of the text to other OMDoc elements and contain 

1. an (optional) metadata declaration (we use the well-known Dublin Gore 
schema, cf. http://purl.org/dc/ or see [KohOOb]) 

2. a non-empty set of CMP elements that contain the text proper. 

The rsrelation attributes allow us to markup the discourse structure of a 
document in form of so-called discourse relations following the the well-known 
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“Rhetorical Structure Theory” RST [MT83, Hor98] content model, which models 
a text as a tree whose leaves are the sentences (or phrases) and whose internal 
nodes model the relations between their daughters. This generalizes markup 
schemes of text fragments offered e.g. by I^TeX into categories like “Introduc- 
tion” , “Remark” , or “Conclusion” . This is sufficient for simple markup of existing 
mathematical texts and to replay them verbatim in a browser, but is insufficient 
e.g. for generating individualized, presentations at multiple levels of abstractions 
from the representation. The OMDoc text model - if taken to its extreme - can 
be used to pinpoint the respective role and contributions of smaller text units, 
even down to the sub-sentence level, and can make the structure of mathematical 
texts “machine understandable”. 

Concretely, the rsrelation attributes specifies the relation type in a type at- 
tribute and the RST tree daughters in attributes for (for the head daughter) and 
from for the others. At the moment OMDoc uses a variant of the RST [MT83] 
content model that supports the relation types introduction, conclusion, 
thesis, antithesis, elaboration, motivation, evidence, linkage with the 
obvious meanings, motivated by the application to mathematical argumentative 
texts (see also [Hor98]). The relation type also determines the default presenta- 
tion. 



4 Theory Elements 

Traditionally, mathematical knowledge has been partitioned into so-called the- 
ories, often centered about certain mathematical objects like groups, fields, or 
vector spaces. Theories have been formalized as collections of 

— signature declarations (the symbols used in a particular theory, together with 
optional typing information). 

— axioms (the logical laws of the theory) . 

— theorems; these are in fact logically redundant, since they are entailed by 
the axioms. 

In software engineering a closely related concept is known under the label of 
an (algebraic) specification, which is used to specify the intended behavior of 
programs. There, the concept of a theory (specification) is much more elaborated 
to support the structured development of specifications. Without this structure, 
real world specifications become unwieldy and unmanageable. 

In OMDoc, we support this structured specification of theories; we build 
upon the technical notion of a development graph [Hut99], since this supplies a 
simple set of primitives for structured specifications and also supports man- 
agement of theory change. Furthermore, it is logically equivalent to a large 
fragment of the emerging Casl standard [CoF98] for algebraic specification 
(see [AHMSOO]). 

Theories are specified by the theory element in OMDoc. Since signature 
and axiom information is particular to a given theory, the symbol, definition, 
axiom elements must be contained in a theory as sub-elements. 
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<theory id="monoid-thy">. . . 

<symbol id="monoid"> 

<cominonnaine xml : lang="en">monoid</ coimnonname> 
<coimnonname xml : lang="de">Monoid</ commonname> 
<commonname xml : lang="it">monoide</ coimnonname> 
<type system="simply-typed"> 

set [any] -> (any -> any -> any) -> any -> bool 
</type> 

</ symbol>. . . 

</theory> 



Fig. 3. An OMDoc symbol declaration 



symbol This element specifies the symbols for mathematical concepts, such as 1 
for the natural number “one” , + for addition, = for equality, or group for the 
property of being a group. The symbol element has an id attribute which 
uniquely identifies it. This information is sufficient to allow referring back to 
this symbol as an OpenMath symbol. For instance the symbol declaration 
in Fig. 3 gives rise to an OpenMath symbol that can be referenced as <0MS 
cd="monoid" nEmie="monoid"/>. If the document containing this symbol 
element were stored in a data base system, the OpenMath symbol could 
be looked up by its common name. The type information specified in the 
signature element characterizes a monoid as a three-place predicate (taking 
as arguments the base set, the operation and a neutral element), 
definition Definitions give meanings to (groups of) symbols (declared in a 
symbol element elsewhere) in terms of already defined ones. For example 
the number 1 can be defined as the successor of 0 (specified by the Peano 
axioms). Addition is usually defined recursively, etc. 

The OMDoc definition element supports several kinds of definition mech- 
anisms specified in the type attribute currently: 

The FMP (see section 5) contains an OpenMath representation of a logi- 
cal formula that can be substituted for the symbol specified in the for 
attribute of the definition. 

The formal part is given by a set of recursive equations whose left and 
right hand sides are specified by the pattern and value elements in 
requation elements. The termination proof necessary for the well-defi- 
nedness of the definition can be specified in the just-by attribute of the 
definition. 

Here, the FMP elements contain a set of logical formulae that uniquely de- 
termines the value of the symbols that are specified in the for slot of the 
definition. Again, the necessary proof of unique existence can be specified 
in the just-by attribute. 

This can be used to directly give the concept defined here as an OpenMath 
object, e.g. as a group representation generated by a computer algebra 
system. 

Fig. 4 gives an example a (simple) definition of a monoid. 
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For a description of abstract data types see [KohOOb] 



<definition id="mon.dl" f or="monoid" type="simple"> 

<CMP> 

A structure (M,*,e), in which (M, *) is a semi-group 
with unit e is called a monoid. 

</CMP> 

</def inition> 



Fig. 4. A Definition of a monoid 



4.1 Complex Theories and Inheritance 

Not all definitions and axioms need to be explicitly stated in a theory; they can 
be inherited from other theories, possibly transported by signature morphism. 
The inheritance information is stated in an imports element. 

imports This element has a from attribute, which specifies the theory which 
exports the formulae. 

For instance, given a theory of monoids using the symbols set, op, neut 
(and axiom elements stating the associativity, closure, and neutral-element 
axioms of monoids) , a theory of groups can be given by the theory definition 
using import in Fig. 5. 



<theory id=" group "> 

<imports id=" group. import" from="monoid" type=" global "/> 
<axiomXCMP> Every object in 

<0M0BJ><QMS cd="monoid" name="set"/x/0M0BJ> has eui inverse. 
</CMPX/axiom> 

</theory> 



Fig. 5. A theory of groups based on that of monoids 



morphism The morphism is a recursively defined function (it is given as a set of 
recursive equations using the requation element, described above). It allows 
to import specifications modulo a certain renaming. With this, we can e.g. 
define a theory of rings, where a ring is given as a tuple (i?, -P, 0, — , *, 1) by 
importing from a group (M, o, e, i) via the morphism {M i— > i?, o -p, e 
0,t — } and from a monoid (M, o,e) via the {M R*,o *,e 1}, 

where R* is R without 0 (as defined in the theory of monoids), 
inclusion This element can be used to specify applicability conditions on the 
import construction. Consider for instance the situation given in Fig. 6, 
where the theory of lists of natural numbers is built up by importing from 
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the theories of natural numbers and lists (of arbitrary elements). The lat- 
ter imports the element specification from the parameter theory of elements, 
thus to make the actualization of lists to lists of natural numbers, all the sym- 
bols and axioms of the parameter theory must be respected by the natural 
numbers. For instance if the parameter theory specifies an ordering relation 
on elements, this must also be present in theory Nat, and have the same 
properties there. These requirements can be specified in the inclusion ele- 
ment of OMDoc. Due to lack of space, we will not elaborate this and refer 
the reader to [Hut99, KohOOb]. 




Fig. 6. A Structured Specification of Lists 



4.2 OMDoc Theories and OpenMath Content Dictionaries 

In the examples we have already seen that OMDoc documents contain def- 
initions of mathematical concepts, which need to be referred to using Open- 
Math symbols. In particular, documents describing theories like barshe . omdoc 
or ida. omdoc even reference OpenMath symbols they define themselves. Thus 
it is necessary to generate OpenMath content dictionaries from OMDoc docu- 
ments, or develop an alternative mechanism to establish symbol identity in QMS. 
The generation of content dictionaries is already supported in the MBase sys- 
tem, but can also be achieved by writing specialized XSL style sheets. For the 
purposes of this paper, we will only assume that one of these measures has been 
taken. 



5 Mathematical Elements 

We will now present the mathematical elements that are not integral parts of a 
theory, since they are optional (they can be derived from the material specified 
in the theory). We have the following elements: 
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FMP This is the general element for representing mathematical formulae as 
OpenMath objects, for instance the formula in Fig. 2. As logical formulae 
often come as sequents, i.e. a conclusion is drawn from a set of assump- 
tions, OMDoc also allows the content of an FMP to be a (possibly empty) 
set of assumption elements followed by a conclusion. The intended mean- 
ing is that the FMP asserts that the conclusion is entailed by the assump- 
tions in the current context. As a consequence, <FMP>A</FMP> is equiva- 
lent to <FMP><conclusion>A</conclusionX/FMP>. The assumption and 
conclusion elements allow to specify the content by an OpenMath object 
(OMOBJ) or in natural language (using CMPs). 
assertion This is the element for all statements (proven or not) about math- 
ematical objects (see Fig. 7). Traditional mathematical documents discern 
various kinds of these: theorems, lemmata, corollaries, conjectures, problems, 
etc. These all have the same structure (formally, a closed logical formula). 
Their differences are largely pragmatic (theorems are normally more impor- 
tant in some theory than lemmata) or proof-theoretic (conjectures become 
theorems once there is a proof). Therefore, we represent them in the gen- 
eral assertion element and leave the type distinction to a type attribute. 
These type specifications in OMDoc documents should only be regarded 
as defaults, since e.g. reusing a mathematical paper as a chapter in a larger 
monograph, may make it necessary to downgrade a theorem (e.g. the main 
theorem of the paper) and give it the status of a lemma in the overall work. 



<assertion id="ida. c6slp4 . 11" type=" lemma" > 

<CMP> A semi-group has at most one unit.</CMP> 
</assertion> 



Fig. 7. An assertion about semigroups 



alternative-def Since there there can be more than one definition per sym- 
bol, OMDoc supplies the alternative-def. It not only contains the new 
definition, but also points to two assertions that state the equivalence with 
definitions of the concepts that are already known. 
exEunple In mathematical practice, examples play an equally great role as proofs, 
e.g. in concept formation (as witnesses for definitions, or as either supporting 
evidence or as counterexamples for conjectures). Therefore, examples are 
given status as primary objects in OMDoc. Conceptually, we model an 
example for a mathematical concept C as a triple {W, A,V), where W = 
(Wi, . . . , yV„) is an n-tuple of mathematical objects, A is an assertion of 
the form A = 3Wi . . . W„.B, and 7^ is a proof that shows A by exhibiting 
the witnesses Wi for Wi. The example (W, 3Wi . . . W„.^B, V) is a counter- 
example to a conjecture T:= VIFi ...IF„.B, and {W,A,V') a supporting 
example for T. 

OMDoc specifies this intuition in an element example that contains a set 
of OpenMath objects (the witnesses), and has the attributes 
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— for (for what concept or assertion is it an example), 

— type (one of the keywords or for the function) 

— assertion (a reference to the assertion A mentioned above) 

— proof (a reference to the constructive proof V) 

Consider for instance the structure W: = (A*,o) of the set of 

words over an alphabet A together with word concatenation o. Then 
(yy, 3lT.monoid(lT), T^i) is an example for the concept of a monoid (with the 
empty word as the neutral element), if e.g. Vi uses W to show the existence 
of W. The example (W, 3Konoid-^group(y), 7 ^ 2 ) uses W as a counterexam- 
ple to the conjecture C: = VKonoid-group(C), since Q ^ {V 2 uses W as 

a witness for V). Fig. 8 gives the OMDoc representation of this example of 
an example. 



<example id="mon.exl" f or="monoid" type="for" 

assertion="strings-are-monoids" proof ="sam-pf"> 

<CMP>The set of strings with concatenation</CMP> 

<0M0BJ><DMS cd="simple-monoids" name="strings"/></OMOBJ> 
</example> 

<example id="mon. ex2" f or="monoid" type=" against" 

assertion="monoids-are-groups" proof="mag-pf "> 

<CMP>The set of strings with concatenation is not a group</CMP> 
<0M0BJ><DMS cd="simple-monoids" name="strings"/></OMOBJ> 
</example> 



Fig. 8. An OMDoc representation of an example 



Finally, there are OMDoc elements that support structuring the knowledge in 
theories. We have already seen the possibility to define (parts of) theories by 
so-called theory morphism specified in imports and include elements in sec- 
tion 4.1. Following Butter’s development graph [Hut99], we can use the knowl- 
edge about theories to establish so-called inclusion morphisms that establish the 
source theory as included (modulo renaming by a morphism) in the target the- 
ory. This information can be used to add further structure to the theory graph 
and help maintain the knowledge base with respect to changes of individual 
theories. 

An axiom-inclusion element contains a morphism (see section 4.1), and the 
attributes from and to specify the source and target theories. For any axiom in 
the source theory there must be an assertion in the target theory (whose FMP 
is just the image of the FMP of the axiom under the morphism) with a proof. 
These are represented by an empty by element, which has the attributes axiom, 
assertion, and proof with the obvious meanings. 

A theory-inclusion is a global variant of axiom-inclusion that can be 
obtained as a path of axiom-inclusions (or other theory-inclusion) which 
are specified in the by attribute. 
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6 Proofs 

Proofs are representations of evidence for the truth of assertion. As in the case 
of definitions, there can in general be more than one proof for a given assertion. 
Furthermore, it will be initially infeasible to formalize totally all mathemati- 
cal proofs needed for the correctness management of the knowledge base in one 
universal proof format, therefore OMDoc supports a proof format whose struc- 
tural and formal elements are derived from the VT>S^ structure developed for the 
flMEGA system, but also allows natural language representations at every level. 
In the future, it may be necessary and advantageous to allow various other proof 
representations there like proof scripts (IImega replay files, Isabelle proof 
scripts,...), references to published proofs, resolution proofs, etc, to enhance 
the coverage. 

This mixed representation enhances multi-modal proof presentation [Fie97], 
and the accumulation of proof information in one structure. Informal proofs 
can be formalized [Bau99]; formal proofs can be transformed to natural lan- 
guage [HF96]. 

The OMDoc proof environment contains a list of proof steps. Such derive 
steps have the attributes id (so it can be referred to) and the optional type 
attribute. It can contain the following child elements (in this order) 

CMP This gives the natural language representation of the proof step. 

The rest of the children form the formal content of the derive step. Together, 
they represent the information present e.g. in a VDS node. 

FMP A formal representation of the assertion made by this proof step, they con- 
tain CMP and FMP elements. Local assumptions from the FMP should not be 
referenced to outside the derive step they were made in. Thus the derive step 
serves as a grouping device for local assumptions, 
method is an OpenMath symbol representing a proof method or inference rule 
that justifies the assertion made in the FMP element, 
premise These are empty elements whose xref attribute is used to refer to the 
proof- or local assumption nodes that the method was applied to to yield 
this result. These attributes specify the DAG structure of the proof, 
proof If a derive step is a logically (or even mathematically) complex step that 
can be expanded into sub-steps, then the embedded proof element can be 
used to specify the sub-derivation (which can have similar expansions in 
embedded proof environments again). 

This embedded proof allows us to specify generic markup for the hierarchic 
structure of proofs. 



^ The Proof plan Data Structure (VDS) was introduced in the flMEGA [BCF^97] 
system to facilitate hierarchical proof planning and proof presentation at more than 
one level of abstraction. In a VVS, expansions of nodes justified by tactic applications 
are carried out, but the iuformation about the tactic itself is uot discarded in the 
process as in tactical theorem provers like Isabelle or NuPrL. Thus proof nodes 
may have justifications at multiple levels of abstractiou in a hierarchical proof data 
structure. 
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<derive id="barshe .2.1. 2 .proof . a. proof .D2 . 1"> 

<CMP>By <0M0BJ><DMS cd="barshe" name="alg-prop-reals . A2"/></DM0BJ> 
we have z + {a + (—a)) ~ a + (—a) 

</CMP> 

<conclusion>( 2 ; + a) + (—a) = z -t- (a + (—a))</ conclusion> 

<method><0MS cd="omega-base-calc" name="f oralli*"/>c 
<parameter><DMDBJ><OMV name="z"/></OMOBJ></parameter> 
<parameter><QMDBJ><OMV name="a"/></OMOBJ></parameter> 
<parameter>— a</parameter> 

</method> 

<premise xref="alg-prop-reals . A2"/> 

</derive> 



Fig. 9. A derive proof step 



7 Auxiliary Elements 

In this section we will present OMDoc elements that are not strictly mathemat- 
ical content, but have useful functions in mathematical documents or knowledge 
bases. For the OMDoc representations of things like exercises we refer the reader 
to [KohOOb] and concentrate on the representation of applets and presentation 
information instead. 



7.1 Non-XML Data and Program Code in OMDoc 

Sometimes mathematical services have to be able to communicate (e.g. to the 
MBase system for storage) data in non-XML syntax, or whose format is not 
sufficiently fixed to warrant for a general Xml encoding. Examples of this are 
pieces of program code, like tactics of tactical theorem provers, linguistic data 
of proof presentation system, etc. One characteristic of such data seems to be 
that it is private to certain applications, but may be relevant to more than one 
user. For this, OMDoc provides the private element, which contains the usual 
CMPs and a data element described below. It has the attributes 

pto specifies the system to which the data are private. 

pto-version is its version; Specifying this may be necessary, if the data or even 
their format change with versions. 

format /type the type of the data and the format the data are in, the meaning 
of these fields is determined by the system itself, 
requires specifies the identifiers of the elements that the data depend upon, 
which will often be code elements. 

theory allows the specification of the mathematical theory (see section 4) that 
the data is associated with. 

The data element contains the data of a in a CDATA section (this is the Xml 
way of allowing data that cannot be parsed by the Xml parser). If the content 
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of this field is too large to store directly in the OMDoc or often changes, then 
it can be substituted by a link, specified in the xref attribute. 

The code element is for embedding pieces of code into an OMDoc document. 
This element has the same attributes as the private element, like it, it can 
contain CMP, and data elements. Furthermore, it can contain documentation 
elements input, output and effect that specify the behavior of the procedure 
defined by the code fragment. 

7.2 Applets in OMDoc 

omlet elements contain OMDoc specifications of applets (program code that 
can in some way executed during document manipulation), omlets generalize 
the well-known applet concept in two ways: The computational engine is not 
restricted to plug-ins of the browser (current servlet technology can be used 
and specified using code and omlet elements in OMDocs) and the program 
code can be specified and distributed more easily, making document-centered 
computation easier to manage. 



<code id="callmint"> 

<input>None</ input> 

<output>The result</output> 

<ef f ect>None</ input> 

<data>< ! [CDATA [ . . . the call-mint code goes here ...]]></data> 

</ code> 

<derive id="monp_l"> 

<CMP> <omlet type="js" function="callMint ">Intros . </omlet></CMP> 
<method><DMS name="Intros" cd="COQ"/></method> 

</derive> 



Fig. 10. An omlet 



Like the HtML applet tag, the omlet element can be used to wrap any (set 

of) well-formed elements. It has the following attributes. 

type This specifies the computation engine that should execute the code. De- 
pending on the application, this can be a programming language, such as 
javascript (js) or Oz, or a process that is running (in our case the CXUAX 
or flMEGA services). 

function The code that should be executed by the omlet is specified in the 
function attribute. This points to an OMDoc code element that is acces- 
sible in some way (e.g. in the same OMDoc). This indirection allows us to 
reuse the machinery for storing code in OMDocs. For a simple example see 
Fig. 10. 

argstr allows specification of an (optional) argument string for the function. A 
call to the C^lUX interface would then have the form in Fig. 11. Here, the 
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code in the code element sendtoloui (which we have not shown) would be 
java code that simply sends the argstr to C^UAX's remote control port. 

The expected behavior of the omlet can be implemented in the XSL style sheet, 
which in the case of e.g. translation to Mozilla will put the callmint code 
directly into the generated html. 



<CMP> Let ’ s prove it 

<omlet id="bla type="java" function="sendtoloui" 
argstr= " load (pr oblem= ’ monoid_uniq) " > 
interactively 
< /omlet > 

</CMP> 



Fig. 11. An omlet calling an external process 



7.3 Presentation 

In the introduction we have stated that one of the design intentions behind OM- 
Doc is to separate content from presentation, and leave the latter to the user. 
In this section, we will briefly touch upon presentation issues. The technical side 
of this is simple: OMDoc documents are regular Xml documents that can be 
processed by XSL [Dea99] style sheet to produce conventional presentations from 
OMDoc representations of mathematical documents. At the moment, we have 
XSL style sheets to convert OMDoc to HtML (one each specialized to the re- 
spective browsers), DTf;]X, and to the input languages of the JImega, InKa, and 
XClam systems (they can be found at http://www.mathweb.org/ilo/omdoc). 
At the moment, these hard-code certain presentation decisions for the overall 
appearance of the documents, but we are working on style sheet generators that 
make these user-adaptive. 

The mathematical concepts and symbols introduced in an OMDoc docu- 
ment (symbol elements) often carry typographic conventions, which cannot be 
determined by general principles alone. Therefore, they need to be specified in 
the document itself, so that typographically good representations can be gen- 
erated from this (and subsequent) documents. The presentation element in 
Fig. 12 allows the addition of XSL style sheet information to symbols, where they 
are defined. In this case, the style sheet information will cause an OpenMath 
expression 

<OMA> 

<DMS cd="ida" name="monoid"/XOMV name="M"XOMV name="o"XQMV name="e"> 
</OMA> 

to be rendered as (M, o, e) G MOD in a TffjX or DT^X document derived from 
ida.xml via a suitable XSL style sheet. Of course, this information will need to 
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<presentation f ormat="TeX"> 

<xsl: template match="0MA [QMS [position()=l and 

@name=’ monoid’ and 
@cd= ’ ida . monoid ’ ] ] " > 
(<xsl : apply-templates select="* [2] "/> , 

<xsl : apply-templates select="* [3] "/> , 

<xsl : apply-templates select="* [4] "/>)\in{\bf MON}- 
</xsl : template> 

</presentation> 



Fig. 12. XSL Presentation for the symbol in Fig. 3 



be included into the respective style sheets. This is easily realized by a two-stage 
style sheet process: in the first pass, a general (higher-order) style sheet extracts 
the presentation information from the relevant OMDoc documents, and in the 
second stage, this is used to present the OMOBJs in the source OMDoc. 

The presentation elements discussed up to now, allow specification of the 
presentation of OpenMath elements. To specify the overall structure of mathe- 
matical texts, such as books, chapters, sections, or paragraphs, but also enumer- 
ations, itemizes, lists, we use the omgroup element. We use a general construct 
that specifies the presentation in the type attribute, since the presentation com- 
ponent (style sheet) may need to decide on that, omgroup elements contain an 
optional metadata element and then a sequence of omgroup and ref elements. 
The first allow the definition of a recursive document structure, and elements of 
the second kind are used to refer to other OMDoc elements by the use of xlink 
attributes (most notably xlink: href for hyperlinks). 

Note that this representation, which relies on explicit (hyper)-references in- 
stead of nesting information allows the specification of more than one document 
using the mathematical material specified in the other OMDoc elements. In 
particular, it becomes possible to specify and store more than one lineariza- 
tion of the material in a document, or generate linearization or “guided tours” 
(see [SBC+00] for details). 

8 Conclusion 

We have proposed an extension to the OpenMath standard that allows the 
representation of the semantics and structure various kinds of mathematical 
documents, including articles, textbooks, interactive books, courses. We have 
motivated and described the language and presented an Xml document type 
definition for it. 

We are currently testing this in the development of a user-adaptive interactive 
book including proof explanation based on IDA [CCS99] in close collaboration 
with the authors. This case study unites several of the application areas dis- 
cussed in the introduction. The re-representation of IDA in the OMDoc format 
makes it possible to machine-understand the structure of the document, read 
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it into the MBase [FKOO, KFOO] knowledge base system without loss of infor- 
mation, preserving the structure, and generate personalized sub-documents or 
linearizations of the structured data based on a simple user model. Furthermore, 
the OMDoc representation supports the formalization of (parts of) the mathe- 
matical knowledge in IDA and makes it accessible to the flMEGA mathematical 
assistant system [BCF+97], which can find proofs that solve some of the prob- 
lems either fully automatically (by proof planning) or in interaction with the 
authors. This newly developed stock of formal data (it is not present in IDA 
now) will enable the reader to read and experiment with the proofs behind the 
mathematical theory, much as she can in the present version with the integrated 
computer algebra system GAP [S“*'95]. Finally, OMDoc will serve as the input 
format for the Lima system (see [Bau99]), an experimental natural language 
understanding system specialized to mathematical texts (this can be used to 
develop formalization in FMPs from the text in the respective CMPs). 

In the context of this project, we have developed first authoring tools for 
OMDoc that try to simplify generating OMDoc documents for the work- 
ing mathematician. There is a simple OMDoc mode for emacs, and a DTgX 
style [KohOOa] that can be used to generate OMDoc representations from DTgX 
sources and thus help with the migration of existing mathematical documents. 
A second step will be to integrate the DT[;]X to OpenMath conversion tools. 
Michel Vollebregt has built a program that traverses an OMDoc and substitutes 
various representations for formulae (including the Mathematica, GAP, and 
Maple representations) by the corresponding OpenMath representations. 
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Abstract. There is a wealth of interactive mathematics available on 
the web. Examples range from animated geometry to computing the 
digit in the expansion of tt. However, proofs seem to remain static and at 
most they provide interaction in the form of links to definitions and other 
proofs. In this paper, we want to show how interactivity can be included 
in proofs themselves by making them executable, human-readable, and 
yet formal. The basic ingredients are formal proof-objects, OpenMath- 
related languages, and the latest extensible Markup Language (xml) 
technology. We exhibit, by an example taken from a formal development 
in number theory, the hnal product of which we believe to be a truly 
interactive mathematical document. 



Keywords: Interactive mathematical documents, Formal mathematical proofs. 
Type theory, Markup languages. 



1 Introduction 

One may broadly classify the many examples of online mathematical documents 
promising interactivity into two categories: textual documents that are hyper- 
linked and allow readers to consult several pieces of information by comfortably 
clicking through the links, and documents that are interfaces to software agents 
performing computations (for instance, to visualize graphically a surface, to solve 
a system of equations or to find references of papers in which some integer se- 
quence occurred). Both kinds of documents are considered interactive in that 
the user is actively involved in producing the final reading material. Unfortu- 
nately each document has its own notation and if mathematical objects are used, 
then they can hardly be directly utilized in a different setting. In this form, the 
mathematical knowledge cannot be shared. 

Although Java applets and cgi-scripts have provided the support for em- 
bedding graphical and computational facilities into an interactive mathematical 
document, little interactivity is currently available in proofs. One area in which 
applets have actually been used in “proofs” is planar geometry [25] where they 
can be applied naturally. If we adhere to the idea that the essence of doing math- 
ematics is proving, then we must conclude that mathematical knowledge is still 
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poorly communicated interactively. On the other hand, symbolic computation 
systems, like computer algebra packages and automated proof environments, can 
play a key role in supporting the development of interactive proofs, to generate 
the content, validate it, and act as back engines during online presentation. 

In this paper we present a possible approach. We consider tactics-based proof 
assistants, in the tradition of the AUTOMATE project [23], thus excluding 
resolution-based theorem provers. We focus on type-theoretical proof assistants 
such as Coq, Lego, and NuPRL [11, 22, 10], yet some of the technologies involved 
can be applied to proof plans in the general sense. 

The main appeal of using proofs developed in type-theoretical systems is 
that proofs are terms in the formal language of the system. Proving occurs 
by interaction with the system through specialized user-friendly interfaces [20] 
designed to produce formal proofs. These formal proofs are often too detailed and 
become hard to read. This explains the many efforts made in producing more 
natural descriptions of the resulting proofs [12, 2, 21, 17]. Still, these efforts 
produce only “flat text” akin to conventional informal proofs and not directly 
suited for supporting interaction. In fact, such text is meant to be read by 
humans and not by programs. For interactive use, a published proof also needs 
to access the original formal proof that produced the text; the flat text version 
of the proof alone is not enough. 

In our approach, presentation and content are kept distinct. The formal proof 
is used as content, and multiple views, i.e. multiple presentations of the same 
content, are generated. In order to do so, the formal proof is encoded using 
OpenMath [9], a standard markup language for mathematical content. The over- 
all mathematical document is generated automatically in the OpenMath Docu- 
ment [19] format. This allows the inclusion of OpenMath objects such as formal 
proofs, flat text containing informal mathematics and tactic scripts. In this way, 
sharing of mathematical knowledge and transparent interaction with computa- 
tional tools are achieved. In the paper we present some of the technologies that 
we have developed and that are currently being used for the next version of 
“Algebra Interactive!” [8, 6]. 

The paper is structured as follows. Section 2 discusses the different options 
for including proofs in interactive mathematical documents. In Section 3, some 
of the basic knowledge needed to understand type-theoretical theorem proving 
is briefly recalled using an example. The technologies involved in bringing proofs 
online are described in Section 4. The concluding remarks are found in Section 5. 



2 Proofs in Interactive Mathematical Documents 

In this section we study the options for embedding proofs within an interactive 
mathematical document. 

The first and most obvious is to include a textual version of the proof in 
natural language, similar to proofs in traditional (paper) mathematical texts. 
Indeed, one of the first tools used to publish mathematics on the web was a 
converter from TgX to HTML. Nowadays, most word processing software is able 
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NL Text 


Tactics Scripts 


Proof Objects 


OM encodable 


- 


- 


+ 


mathematical object 


- 


- 


-t 


readable 


+ 


+/- 


- 


machine checkable 


- 


+/- 


+ 



Table 1. Pros and cons of different proof embedding options. 



to output HTML and in some cases the mathematics is not replaced by an image 
(a gif file) but by its presentation in the MathML language [3]. 

Although this form of presentation does contain structure, for example ref- 
erences to lemmata or definitions, the format is usually plain text. Because it 
consists of mathematical vernacular, it is very easy to read but hard to recon- 
struct a fully formal proof from it. Having the mathematical content semantically 
marked-up is desirable for two reasons. First, it can be communicated to compu- 
tational software such as a computer algebra system or a proof checker. Second, 
the structure can be used to open different views on the proof. Allowing the user 
to change views is one form of interactivity we achieve in our approach. 

The second way to embed proofs is to include high-level tactics scripts which 
serve as input for theorem provers. This approach is opposite to the former one 
in that it takes the notion of proof from a theorem prover, i.e. a formal system, 
instead of from informal mathematics. Such scripts are system dependent and, 
in general, it is easy to communicate the proof to the specific system and have it 
checked. Users of the system usually regard tactics scripts as real proofs and do 
not find it difficult to understand them even though they show only one half of 
a dialogue. In recent work, high-level tactic scripts are used to produce natural 
language verbalization of the proof [17]. 

The third way to embed proofs is to include a formal proof, for instance 
a derivation tree. In a type-theoretical setting, the corresponding lambda term 
is a good candidate for the formal representation of the proof. Because it is a 
mathematical term, it can be encoded using a markup language for mathemat- 
ics and it or parts of it can be communicated to computational software in a 
standard way. A major drawback to this option is that formal proofs contain 
many details and hence are not very readable, moreover their marked-up encod- 
ing becomes even larger. However, this drawback can be overcome by creating 
suitable views on the formal tree leaving out the undesired details and adding 
information to clarify the formal structure, see for instance our natural language 
view in Figure 1. 

Table 1 sums up the pros and cons of the above three options. 

In this paper we propose to mix the different options, thus retaining the read- 
ability of textual proofs, the ability to step through the proof interactively, and 
the rigor of formal proofs. Type theoretical systems are based on the “proposi- 
tions as types” paradigm which is recalled in the next section through examples. 
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3 Example of a Formalization in Type Theory 

In this section we briefly sketch the process of formally developing a mathemat- 
ical theory in a proof assistant. The examples we use arise during the formal- 
ization, done in Coq, of Pocklington’s Criterion [24, 7, 4] for finding whether a 
positive number is prime. 

In the type theory of Coq, mathematical concepts are encoded as typed 
expressions. For instance, one can introduce the notions of division and primality 
of natural numbers by the following definitions. Divides is defined in the usual 
way as a predicate on two natural numbers n and m (usually denoted as n\m), 
and Prime is defined as a predicate on a natural number n. 

Definition Divides: nat -> nat -> Prop := 

[n,m:nat] (EX q:nat I m=(mult n q) ) . 

Definition Prime: nat -> Prop := 

[n:nat] (gt n (1)) /\ (q: nat) (Divides q n) -> q=(l) q=n. 

Besides concepts from the mathematical object language, also concepts from 
the meta language such as theorems and proofs are encoded as typed expres- 
sions. This principle is called the Curry-Howard correspondence, also known as 
propositions as types. 

A proof is encoded as an object which has, as its type, the encoding of 
the statement of the theorem. A typed expression which represents a proof is 
called a proof-object. If an expression of type Prop has more than one inhabitant, 
those inhabitants represent different proof-objects of the proposition. If it has 
no inhabitants, then it cannot be proved. 

Consider as example the observation that if all prime divisors of a positive 
number n are greater than ^/n, then the number n is prime. This can be stated 
in Coq as a new theorem (primepropdiv) that uses library functions (mult and 
gt) and newly defined predicates (Divides and Prime): 

Lemma primepropdiv: (n:nat)(gt n (1)) -> 

( (q:nat) (Prime q) -> 

(Divides q n) -> 

(gt (mult q q) n) ) -> 

(Prime n) . 



To assist the user in constructing proof-objects, Coq uses a high-level lan- 
guage of tactics. A user of Coq, trying to prove primepropdiv, would be pre- 
sented with the initial goal and interactively arrive at the following script. 



Intros. Elim (primedec n) . Intro. Assumption. 
Intros. Elim (nonprime_primewitness n) . 

Intros. Elim H2. Intros. Elim H4. Intros. 

Elim H6. Intros. Elim (le_not_lt (mult x x) n) . 
Assumption. Unfold gt in HO. Apply HO. 
Assumption. Assumption. Assumption. Assumption. 
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The script uses lemmata from the Coq arith library and previously proven 
lemmata, like primedec which proves the decidability of the prime predicate and 
nonprime_primewitness which proves that non-prime numbers greater than 1 
have prime-divisors. The proof-object primepropdiv is given below. 



[n : nat ; 

H: (gt n CD) ; 

HO: C(q:nat) (Prime q)->(Divides q n)->(gt (mult q q) n))] 

(or_ind (Prime n) "(Prime n) (Prime n) [HI: (Prime n)]Hl 
[HI: ("(Prime n))] 

(ex_ind nat 

[d:nat] (It (1) d) /\ (le (mult d d) n) /\ (Divides d n) /\ (Prime d) 
(Prime n) 

[x : nat ; 

H2:((lt (1) x) /\ (le (mult x x) n) /\ (Divides x n) /\ (Prime x))] 
(and_ind (It (1) x) (le (mult x x) n) /\ (Divides x n) /\ (Prime x) 
(Prime n) 

[_:(lt (D x); 

H4:((le (mult x x) n) /\ (Divides x n) /\ (Prime x))] 

(and_ind (le (mult x x) n) (Divides x n) /\ (Prime x) 

(Prime n) 

[H5:(le (mult x x) n) ; H6: ((Divides x n) /\ (Prime x))] 

(and_ind (Divides x n) (Prime x) (Prime n) 

[H7 : (Divides x n) ; H8: (Prime x)] 

(False_ind (Prime n) 

(le_not_lt (mult x x) n H5 (HO x H8 H7))) H6) H4) H2) 
(nonprime_primewitness n H HI)) (primedec n)) 



If available, such a proof-object can be used in an interactive document de- 
scribing Pocklington’s Criterion in several ways. First of all, it provides evidence 
of the truth of the assertion it proves: such a term can be easily checked by Coq 
to be of type primepropdiv. If it is encoded in a system-independent standard 
language such as OpenMath, then it can be shared. Moreover, it can also be used 
to produce a natural language view of the proof it represents since in its current 
form it is not readable. Part of our technology includes a tool that, when given a 
context and a proof object, produces a natural language view of the associated 
proof. The tool can interactively adapt the level at which the proof is displayed 
by collapsing or expanding certain sentences. The major technologies involved 
are described in Section 4. 



4 Enabling Technologies 

This section introduces the core technologies developed to support our approach 
to proofs in interactive mathematical documents. The natural language view is 
strongly connected to formal proof objects based on type theory. OpenMath and 
related technologies enable the representation of structured mathematical infor- 
mation such as a proof term, its context, a tactics script and natural language 
explanations. 



4.1 The Natural Language View 

The natural language viewer, described in [?], extends the standard algorithm 
for translating Coq proof-objects presented in [13]. The input is a Coq context 
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and a proof-object. The output obtained by the standard algorithm is a detailed 
natural language proof. Our implementation takes into account requirements for 
interactivity and improves upon it in two ways. 

First, instead of producing flat text, the final presentation is an adjustable 
view. It is generated from an object which is tightly connected to the original 
formal proof-object. This intermediate object contains both natural language 
parts and formal parts. When rendering such an object on the screen, the formal 
parts of a sentence may be treated differently. For example, whenever the name 
of a definition occurs, the renderer produces a hyperlink to the place in the 
context where the definition is introduced. 

Second, recursive calls of the translation algorithm result in a sentence which 
can be expanded or collapsed by the reader, similar to the display of a directory 
structure in file-browser in modern GUI systems. When the sentence is collapsed, 
instead of recursively translating the corresponding subproof, the renderer dis- 
plays the type of the subproof. Figure 1 shows a combination of the natural 
language view with Fitch-style natural deduction notation. 

The translation algorithm is rule based, and is driven by the structure of the 
proof-object. Some of the rules are given below. Note that the translations uses 
type inference to guide the translation process. Here M ^ on the left hand side 
means “M is of type r” and IZ3 (M) on the right hand side means “recursively 
translate M when the folder is open, display the type of M when it is closed” . 

1 -^ |D “By” h “we have” r 




r □ (M) 

1 D “By taking” N “for” x “we get” r 

fa(iv) 

a(M) 

y D “We deduce” r 

{ D Assume A (h) 

C3(m) 

D “We have proved” t 



{ D “Consider an arbitrary” x “G” A 

C3(m) 

D “We have proved” r “since” x 
“is arbitrary” 

The sentences are kept as abstract as possible by not unfolding definitions 
unless needed, using expected types instead of derived types as done in [12] and 
by repeated introduction of variables in one sentence. 
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Many of the usual symbols from mathematics and logic can be defined in 
type theory. The fact that notions such as the natural numbers and their in- 
duction principle are not primitive but definable, is regarded as a strength of 
type-theoretical systems. In fact, even connectives like A, V, and 3. and re- 
lations like = (general Leibniz equality) and < (on N) are all definable in Coq. 
They are defined in the standard library together with their introduction and 
elimination rules. In theory, it is sufficient to add to the above rules translation 
rules for inductive types and their inhabitants. However, the translation algo- 
rithm treats some defined objects as primitives in order to choose a translation 
as close as possible to the language of informal mathematics. 

The example from Section 3 uses many lemmata from the standard Coq 
library Arith. Some of these should also be treated as primitive by the natural 
language viewer since they would be considered trivial in informal mathematics. 



Fitcludew 



Lemma primepropdiv: For all n in nat: (n>1)->(For all q in nat: (Prime ... 

Consider an arbitrary n in 
Assume n>l(H)] 

Assume For all q m nat; (Prime qj->(Divides q n)->((q*q)>n)(H0)] 

(Prime n)V~ (Prime n) 

(Prime n)->(Prime n) 

~ (Prime n)->(Prime n) 

In any case, we have Prime n 

We have proved (For all in nat: (Prime q)->(Divides q n)->((q*q)>n))->(Prim6 
We have proved (n>l)->(For all q in nat: (Prime qj->(Divides q n)->((q*q)>n))-- 
We have proved For all n in nat: (n>l)->(For all q in nat: (Prime q)->(Divides q n 






Fig. 1. A Natural Language View of primepropdiv 



4.2 OpenMath, OpenMath Documents, and MathML 

The extensible Markup Language (xml) is becoming an increasingly popular 
choice as source language in which to represent semantically rich information 
that can be searched, stored and presented in different formats. The World Wide 
Web Consortium is currently recommending several technologies related to xml 
and the next generation of browsers will be XML-enabled, namely will be able 
to directly display xml documents. This section discusses how some of these 
technologies can be used to provide interactivity to “online” proofs. 
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OpenMath The predominant xml languages for mathematics are MathML 
and OpenMath. Ideally these two languages complement each other: MathML- 
Presentation can be used for presenting mathematical content written in Open- 
Math. A detailed description of OpenMath is given in [9]. In this paper we assume 
a certain level of familiarity with the general OpenMath ideas and describe it 
by examples. 

OpenMath is a language for the representation of mathematical content. The 
symbols used in the OpenMath objects are defined in xml documents called 
Content Dictionaries (CDs). Official CDs are available for public use from the 
OpenMath Society [28] but users may also write private CDs of symbols used in 
own applications. As example, consider the definition of the OpenMath symbol 
<0MS cd="pock" ricmie="Divides"/>, denoted in short by pock:Divides, for 
representing the predicate Divides on natural numbers used in the example in 
Section 3. The formal definition of the example is represented as an OpenMath 
“defining mathematical property”, DefMP: 



<DefMP name="Divides"> 

<0M0BJ><0MBIND><0MS cd="lc" name="Lambda"/> 

<0MBVAR><!— n:N, m:N — > 

<0MATTR><0MATP><0MS cd="icc" name="type"/> 

<0MS cd="setname" name="N"/> 
</DMATP> <DMV name=’'n’7> </0MATTR> 

<0MATTR><0MATP><0MS cd="icc" name="type"/> 

<0MS cd="setname" name="N"/> 
</0MATP> <0MV name=’'m’7> </0MATTR> 

</DMBVAR> 

<0MBIND><0MS cd="qucintl" name="exists'7> 
<0MBVAR><!— q:N — > 

<0MATTR><0MATP><0MS cd="icc" name="type'7> 

<0MS cd="setname" name="N'7> 
</0MATP> <DMV name="q’7> </0MATTR> 

</0MBVAR><0MA><0MS cd="relationl" name="eq'7> 

<0MV namG="m'7> 

<0MA><0MS cd="arithl" namG="times'7> 
<0MV name="n'7> 

<0MV name="q'7> 

</DMAx/0MA> 

</OMBINDX/DMBIND> 

</DM0BJ> 

</DefMP> 



The formal signature can be given similarly in terms of an OpenMath object. 

As mentioned before, a proof-object is a term and as such it can be rep- 
resented in OpenMath provided some primitive symbols are available in some 
CD. Symbols for constructing and eliminating inductive types in the Inductive 
Calculus of Constructions used by Coq are given in the CD called icc. Addi- 
tionally, we have a private CD for symbols that Coq uses in proof-objects and 
refer to introduction and elimination rules of inference. For example, the Coq 
symbol and_ind in the proof-object in Section 3 is represented by the OpenMath 
symbol coq: and_ind in the private CD coq and can be exactly defined in terms 
of the primitive inductive constructors. There are two reasons for representing 
the proof-object at a higher level than necessary. Firstly, the extra information 
conveyed by the specific inference rule used in the proof can be directly used 
for tuning the natural language presentation of the term. The second reason is 
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compactness and readability. The proof-object can be easily transformed into 
one which uses only the primitive inductive constructors. 

OpenMath is not designed to express hierarchies of mathematical knowledge 
since it lacks the mechanisms to relate definitions, theories and theorems. More- 
over, the command language used during an interactive session with a computa- 
tional tool is hard to express formally in OpenMath. Variable declarations and 
tactic scripts are examples of this limitation and help motivate the introduction 
of OpenMath Documents. 

OpenMath Documents The OpenMath Document Specification (omdoc) 
[19], currently under development by Kohlhase and ourselves, is an xml docu- 
ment type definition that can be used to represent general mathematical knowl- 
edge the way it is written in lecture notes and in scientific articles, but also in 
mathematical software like algebraic specification modules or library files of a 
proof checker. It is being used as source format for the next release of the Al- 
gebra Interactive! book [8], an interactive textbook used in teaching first year 
university algebra. OpenMath Documents are intended to be the input format 
for a knowledge base of mathematics, Mbase [18]. 

The mathematical objects within an OpenMath Document are expressed 
using an extension of the xml encoding^ of OpenMath. Most important, the 
usage of OpenMath, conveying the semantical content and not the presentational 
content of the mathematics, offers two major advantages: 

— It allows the mathematical knowledge base to use techniques such as pattern 
matching or unification to implement search, e.g. modulo and equational 
theory. 

— It equips OpenMath Documents with a standard language for the commu- 
nication among mathematical services, thus making them suitable to be 
exchanged between systems for symbolic computation and reasoning. 

In this paper we focus on using OMDOC for representing and publishing proofs 
in interactive mathematical documents. 

The format of an OpenMath Document provides an interface for proof pre- 
sentation by allowing fine-grained interleaving between the formally specified 
part of the proof and the informal, vernacular text. There are several options 
for writing a proof to an assertion. All of the views discussed in Section 2 may 
coexist in the same document and, depending on the viewer, presented upon 
request. 

As the techniques for producing natural language descriptions from proof 
objects or from tactics/proof plans improve, we may well envision that simple 
conventional “informal” proofs will become reproducible by automatic machin- 
ery. For now, it is possible to simply include informal proofs mixing formal 
OpenMath statements with flat text. 

Interactive proofs described by a proof or a tactic script are put directly in 
an OpenMath Document and become executable once the document is loaded 

^ It contains extra attributes for linking OpenMath subtrees 
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in a browser. Most of the interactivity available to the reader clicking through a 
proof is supported by a combination of Javascript and Java that is invoked on 
the correct data by relying on the Document Object Model [14]. Each step in 
the proof is a request of a performative that is sent to the appropriate software 
package server or to a broker in a network like [15, 27]. More interesting is the 
possibility of choosing different mathematical servers depending on the nature 
of the requested computation. For an example, see [16]. It is well known that, 
for instance, type-theoretical proof checkers are not very good at equational 
reasoning [1]. Trying to mimic equational reasoning in such systems produces 
long and unintelligible proof-objects, not to mention the fact that it requires 
deep knowledge of the proof assistant for understanding them. This also means 
that in such cases, an interactive proof gives little insight to the reader and 
misses the point of the proof. 

The OpenMath encoding of the proof-object is also stored in the document 
directly and can trigger the natural language viewer. Moreover, because it is 
encoded in standard OpenMath, the proof-object can be exchanged easily among 
proof checkers implementing similar type theories. 



MathML As we said, OpenMath is about conveying content and not about 
presentation. The same can be said of OpenMath documents. Both are not 
meant to be read in their xml encoding but transformed to more convenient 
presentation formats for online browsing. 

Two technologies under development that produce customized output for- 
mats from an xml input source are based on Cascading StyleSheet (css) lan- 
guage and the extensible Stylesheet Language (xsl) [26]. Using these, it is pos- 
sible to convert OpenMath Documents to various flavors of dynamic HTML using 
MathML-Presentation for the OpenMath objects [5]. Similarly, it is possible to 
generate DTf^X documents for producing printed version of the material. 



5 Conclusions and Future Work 

We have presented some of the latest technologies we are developing in order to 
support true interaction in mathematical documents and in particular in math- 
ematical proofs. The key issue is being able to distinguish content from presen- 
tation. The content of a mathematical proof is a formal proof-object encoded in 
OpenMath, whereas the various presentations are given in an OpenMath docu- 
ment as conventional informal descriptions, or are automatically generated from 
the proof-object. Computational content of a proof is conveyed by including 
tactic scripts which become executable by the presentation in browser. 

Future work lies in further developing the natural language viewer by includ- 
ing more libraries for primitives, and adding more authoring capabilities allow- 
ing user-customized translations. More work needs to be done on the OpenMath 
document and associated translation tools. In the end, we want to be able to 
create large interactive mathematical documents semi-automatically by using. 
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for instance, the knowledge contained in a formal development done in a proof 
assistant. 
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Abstract. In a mediator system based on annotated logics it is a suit- 
able requirement to allow annotations from different lattices in one pro- 
gram on a per-predicate basis. These lattices however may be related 
through common sublattices, hence demanding predicates which are able 
to carry combinations of annotations, or access to components of anno- 
tations. 

We show both demands to be satisifiable by using various composition 
operations on the domain of complete bounded distributive lattices or 
bilattices, most importantly the free distributive product. 

An implementation of the presented concepts, based on the KOMET 
implementation of SLG-AL with constraints, is briefly introduced. 

Keywords: Annotated Logic, Distributive Lattices, Dual Transform, 
Free Distributive Lattice Product, Mediator, SLG Resolution. 



1 Introduction 

Complete distributive lattices and bilattices have been generally recognized as 
classes of lattices which due to their properties are highly suitable for use in 
annotated logic (AL). Recently, larger lattices than the well-known ones with a 
small finite number of elements, such as TOUTZ, have been investigated. One 
problem is the high computational complexity associated with larger lattices 
when used in general deduction procedures over annotated logic. Therefore AL 
did not seem very well suited to solving complex problems where deep searches 
over large search spaces are required. 

Various mechanized reasoning paradigms may be regarded as spread over a 
spectrum from highly specialized methods which are very efficient in their prob- 
lem domain, to more general methods which suffer complexity disadvantages. 

Computer Algebra Systems (CAS) are one class of specialized systems, con- 
taining large libraries of algorithms for different fields of mathematics. Another 
class at this end of the spectrum are natural language processing systems. 

Moving away from specialized single-domain systems, automated theorem 
provers (ATP) are suited to limited mediatory roles, at the same time remaining 
specialized to deep searches, as shown in [1] where a hierarchical relationship 
between a CAS and a strategic theorem prover is outlined. A limitation of ATPs 
in this situation is their lack of paraconsistent reasoning capabilities. 
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Annotated logic systems are typically found towards the general side of the 
spectrum, in roles such as query distribution tools or, where paraconsistent rea- 
soning is required, as mediators. Annotations are often derived from a specialized 
application domain, and there either from its object domain, e.g. as sets of ob- 
jects to which a result applies, or from additional information such as (un-) 
certainties associated with results. Measures of certainty from different special- 
ist systems are often incompatible, making a comparison on qualitative terms 
necessary as opposed to quantitative comparison between e.g. a fuzzy and a 
probabilistic value. 

From this it follows that AL systems need to be flexible and extensible with 
respect to their annotations. It is well known that the size (for instance in terms 
of the number of grounded or of join-irreducible elements) of the annotation lat- 
tice is a major complexity parameter in AL proof procedures, therefore as more 
and more annotation domains are introduced into an AL system it would appear 
to become impractical. However, just as a logic program with a large number 
of predicate symbols typically uses only few of them in any single clause or in 
a set of closely related clauses, only a small number of annotation domains are 
connected by the literals in individual clauses and sets of closely related clauses. 
In other words, locally the annotation lattices do not grow boundlessly, while 
globally the annotation domain may grow almost proportionally to the number 
of specialized domains to be integrated. A traditional Generalized Annotated 
Logic Program (GAP) demands that all literals be annotated with elements of 
the same lattice, but from the preceding argument it would seem reasonable to 
demand that different lattices should be permitted in a single AL program. The 
only restriction is that because of the way satisfaction is defined in Def. 1(b), 
a single predicate symbol must always be annotated with elements of the same 
lattice. 

The annotation values in a clause instance are not unrelated, even if they 
come from different lattices. For example, one would like to be able to pass a cer- 
tainty value from one or more body literals to the head literal even if those literals 
have different annotation lattices. One method to accomodate this requirement 
is that of multiple annotations: a predicate symbol has not a single annotation 
lattice but a fixed tuple of annotation lattices. The sharing of annotation values 
through annotation variables would be permitted between same-lattice annota- 
tions of literals in a clause. However, the semantics of multiply annotated literals 
would remain to be defined. We present a more general solution, allowing various 
ways of composing and decomposing lattices and lattice values, along with one 
version of multiple annotations which is encompassed by our solution. 

Another, unrelated reason to introduce generic structuring into annotation 
lattices is that some examples of complex AL applications naturally construct 
their annotation domains in a structured manner. As an example we will present 
how stable models of a logic program with negation would be translated to embed 
them into a mediatory AL system. 

Following some definitions in Section 2, we present the free distributive prod- 
uct and a few other composition operations on the domains of complete bounded 
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distributive lattices and bilattices in Section 3, and their implementation in 
KOMET in Section 4. The aforementioned example is presented in Section 5. 



2 Definitions 

Annotated logic has been studied since ca. 1989 as an outgrowth of multi-valued 
logics where the set of truth values is a lattice (e.g. [3,10,13]). They are set 
apart from general multi-valued logics by the use of bilattices, treatment of 
inconsistencies, and two types of negation, called epistemic and ontologic e.g. 
in [10], explicit and default in more recent literature. Usually, satisfaction of 
annotated atoms and therefore complex formulas are two-valued. A framework 
for clausal logic programs using this type of annotated logic was introduced under 
the term Generalized Annotated Logic (GAP) by Kifer and Subrahmanian [11]. 
At the same time, logic programs with multiply annotated atoms were proposed 
whose semantics were not based on lattice properties of the sets of annotations, 
e.g. [14]. The following definition already takes into account that we want to 
allow each predicate to carry a different annotation lattice. 

Definition 1 (GAP). 

(a) A GAP signature consists of a first-order predicate logic (PLl) signature 
(disjoint sets of symbols for predicates, object variables, functions and constants) 
and an annotation signature, which consists of a complete bounded distributive 
lattice Lp for each predicate symbol p, disjoint sets of symbols for annotation 
variables for each lattice, and annotation function symbols. All predicate and 
function symbols are considered to have unique arity, furthermore all annotation 
function symbols have fixed argument and result types. 

The set Ann{L) of annotations of an annotation signature that is associated 
with a lattice L is defined recursively: 

— Every lattice element a G L is a constant annotation. 

— Every annotation variable is a variable annotation. 

— For any k-ary annotation function symbol f with result type L and anno- 
tations ti . . .tk from the appropriate Ann{Li) . . .Ann{Lk), f{t\ . . .tk) is a 
complex annotation. 

An annotated literal p : a consists of a PLl atom p and an annotation a G 
Ann{Lp). 

A negative annotated literal is notp : a where p : a is a (positive) annotated 
literal. This negation is called default negation. 

If L is a bilattice, the explicit negation of a literal is ^p : a := p : where 

the on the right side is the bilattice negation. 

Generalized annotated logic programs are built as conjunctions of clauses of 
annotated literals in the usual manner: 



p : a 



qi : bi A ■ ■ ■ A qn ■ bn A not ri : ci A • • • A not Vm : c„ (1) 
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where p : a is called head, qi : 6i A • • • A not : Cm the body of the clause. The 
head or body of a clause may be empty. All variables that appear in the head must 
also occur in the body. Unbound object and annotation variables are implicitly 
universally quantified. 

(b) An interpretation of a GAP consists of a mapping M : H ^ IJ^ ip ini of 

the Herbrand universe H of the PLl signature of the GAP into the interpretation 

lattice Lpint, such that M{A) € Lpint if p is the predicate symbol of the ground 
literal A. Lpint fnay be Lp itself the ideal lattice I{Lp) or some other sublattice of 
the powerset lattice P{Lp); the choice must be the same for allp. Furthermore, an 
interpretation maps every annotation function symbol to an evaluable function 
over the appropriate lattices. Satisfaction is defined as follows: 

M \=p: a iff a< M{p) if Lp^nt = Lp 

M\=p:a iffaeM{p) if Lpi„t = liLp) or Lp,_nt = P < ’2^” 

M \= ^p : a iff M \= p : (^o) 

M ^ not p : a ijf M ^ p : a 

Satisfaction of a composite formula is defined recursively in the usual ( two- 
valued) manner. Where function symbols appear in a formula, satisfaction is 
determined by evaluating the function which substantiates the function symbol. 

The semantic resulting from L^t = L was called restricted by [11], the one 
with Lint = I{L) is called general. 

Completeness and boundedness of annotation lattices are required in order 
for the restricted and general semantics of GAPs to remain closely related to 
those of non-annotated clausal logic programs [11]. 

Under the restricted or general semantics, some logic programs do not have 
a model. In the two-valued case, the well-founded semantics under which all 
safe logic programs have a three-valued model has gained wide acceptance. The 
extension of the well-founded semantics to GAPs has been described in [15]. 

Definition 2 (Well-Founded Semantics of GAPs). 

1. Given a set of annotated clauses P and an interpretation I, a set of trans- 
formed clauses (the Gelfond-Lifschitz transform) is defined as 

G{P, I) = {p \ a ^ qi \ bi A ■ ■ ■ A qn ■. bn\ 

p : a ^ qi : bi A ■ • ■ A qn ■ bn A not ri : ci A • • • A 

not Vm : Cm & P is ground instance of a clause in P, 

Vi = 1 . . .m : I{rfi cj 



G{P, I) is an annotated clause set without negation, and the mapping I i-^- 
G{P,I) is antimonotonic. Therefore, Gp{I) '■= lfp{RG(P,i)) exists andQp is 
antimonotonic. 

2. Qp is monotonic and has a least and a greatest fixpoint. The well-founded 
semantics of P is defined as 

P iff a <L^ Vp{Gp){p) 

P notp : a iff a gfp{G%){p) 
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A proof procedure which is complete and sound with respect to the well- 
founded semantics is SLG [7]. An annotated extension of SLG, called SLG-AL, 
has been described by P. Kullmann [12], who also provided the implementation 
KOMET which is the basis for an implementation of structured lattices pre- 
sented here. The long description of the proof procedure as well as soundness 
and completeness proofs for the two-valued and annotated cases are omitted 
here and can be found in [7] and [12] respectively. 

3 Lattice Products and Other Composite Lattices 

The following sections introduce some lattice composition operations. The best 
known such operation is the cross product; however there are several other, at 
least equally important operations, among them the free distributive product. 

3.1 Composition Operations 

An annotated clause as in (1) can be read as 

If I{qi) > b\ and ... and I{qn) > bn and I{r\) ^ ci and ... and I{rm) ^ 

Cm, then I{p) > a. 

Read as a condition on interpretations, I{p), . . . I{rm) are merely placeholders, 
which we subsequently write as variables Xi. The set of annotation conditions 
L' := {(a; > a)|a € Li} U {false} on variables x G Li — {T} with reverse 
implication ordering is a lattice which is isomorphic to L. 

Assume from now on that all lattices Li are distributive and bounded. If a 
literal p : ai, . . . ,ak is multiply annotated with Ui G Lpi, the value assigned to 
p by an interpretation is a tuple I{p) = {I\{p), . . . ,Ik{p)), Ii{p) G Lpi and the 
obvious extension of the above reading would be 

If I\{p) > oi and ... and Ik{p) > au ... 

The set of multiple annotation conditions L* = {(a;i > ai) A • • • A {xk > ak)\ai G 
Lpi,i = l...k} is no longer a lattice, but only a meet-semilattice. Still, this 
approach is frequently taken for the semantics of multiple annotations. 

To extend L* to a lattice, we add disjunctive expressions, i.e. L** is the set 
of expressions recursively defined as 

1. Every {xi > ai) is a member of L**, 

2. If A, R G L**, then A V R G L** and A A R G L**, 

3. true and false are members of L**. 

The expressions thus defined may be simplified using absorption and distributive 
laws. Gomparison is defined through subsumtion. The definition implies that 
each Li is embedded homomorphically in L** and Li D Lj = {T,T} for i yf 
j. Thus L** is the free distributive lattice generated by the L', or the free 
distributive product of the L). 
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The preceding transformation from lattices to sets of inequalities is not re- 
quired for the construction of free distributive products, however it may be useful 
for understanding their application to annotated logic. We write 

L = Li (g) ••• (g)Lfe (2) 

L is bounded and distributive and if all Li are complete, L is also a complete 
lattice [9]. 

In free distributive lattice products (FDLP) there are strong, well known 
normal form properties (e.g. [9]). If an element a G L is given as a polynomial 
expression 

n 

(X — \J (Xjij dji G Li -[-L}, (3) 

i=l i6/ 

its unique conjunctive normal form consists of the join-terms of the fully ex- 
panded dual transform of the polynomial which are not subsumed by other 
terms. Its computation is closely related to that of reduced dual transforms 
in other domains, thus the same algorithmic techniques can be used and some 
algorithms can easily adapted for normal form computation in FDLPs. The ex- 
ceptions are as follows: Algorithms that attempt to generate a minimal number 
of terms do not produce the unique normal form. Also, lattices admit simpli- 
fication of terms by replacing sets of mutually incomparable elements of the 
same lattice with their join or meet resp. as appropriate, which is not possible 
in propositional logic and is therefore not exploited by most algorithms. On the 
other hand, elimination of tautological or paradoxical terms containing comple- 
mentary literals, a standard optimisation in propositional or predicate logic, is 
generally not possible in non-boolean lattices. 

The cross product, written as x, is defined as Li x - • - xLfc = {(ai . . . ak)\di G 
Li\ with {ai)i < (bi)i ai < biii = 1 . . .k, and meet and join are defined com- 
ponentwise. Again, boundedness, completeness and distributivity of factors are 
preserved in the product. In the terminology of categories, the cross product is 
a sum or co-product, sometimes written as 0, and distributive laws hold for 
the free distributive product and the cross product on the class of bounded 
distributive lattices. 

In the previous two definitions, all lattices have equal weight in their compos- 
ite ordering. By contrast, the lexicographic product assigns strict priorities 
to its components. 

Definition 3 (Lexicographic Product). Given distributive bounded lattices 
Li = {Mi, <i), i = 1 .. .n and M := Mi x • • • x M„ their cross product as sets. 
M is partially ordered by 

(cil , . . . , Oji{ ^lex (^ 1 5 ■ ■ ■ j bn) iff 

3 z, 1 < z < n, such that Oi < bi and \f k,l < k < i : Ok = bk (4) 
If all Li are totally ordered, so is (M, <iex). 
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Let a = (oi, . . . , a„), b = (6i, . . . , 6„) € M and i maximal, 1 < z < n + 1, 
such that y k,l < k < i : Gk = bk- Then 



r a 


if a >iex b 


a Viex b = < b 


if a <iex b 




. , c„) if a, b incomparable 




if l<k <i 


with Ck — \ a^y hk if k = i 


U 


if i < k < n 



and A defined dually. Then Liex := (Af, <iex, Viex> Aiex) is a lattice. 

A typical use of a lexicographic product occurs in mediators, where a best 
solution in some sense has to be selected from a set of possible solutions and it 
is therefore desirable to have an unique maximal element in any set. 

Other operations on lattices which preserve completeness, distributivity and 
boundedness are order reversal and the addition of a new bottom element U 
below _L. The latter is helpful if lattice values are to be imported from an external 
source but appearance of its _L in a term should not cause the whole term 
to be considered _L and therefore eliminated, as is the case with the standard 
embedding of components in FDLPs. 

The operations on the domain of complete bounded distributive lattices (of 
which bilattices are an important subdomain) that have been introduced so far 
are connected in various ways. One of these is the aforementioned distributivity 
of FDLP and cross product. We briefly state a few other results; the proofs and 
further discussion can be found in [20] . 

Bilattices can be constructed uniquely as the cross product of two distributive 
lattices, and if the bilattice is symmetrical, the two lattices are isomorphic. In 
this case one writes B = B{L). 

Lemma 1. A symmetrical bilattice B = B(L) is isomorphic to the FDLP B = 
L 0 TOVnZ, where TOUTZ = {T, t, f,T} with the ordering T < t, / < T. 



Lemma 2. The smallest distributive bounded lattice E = {T,T} is the neutral 
element of the domain of distributive bounded lattices with respect to the FDLP, 
i.e. E ® L = L. 



Lemma 3. Lf Mi < Li are sublattices with T Mi = Tl., = J-ij, then 

M = < L = Ln particular, setting Mfc+i . . . M„ = E gives 

< L. 

The last statement means that the difference between the traditional defini- 
tion of GAPs which demands the same lattice for all annotations and the relaxed 
form which allows each predicate symbol to be annotated with a different lattice 
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may be reconciled, since the different lattices could be embedded into a single 
FDLP. 

The latter remark raises a notational difficulty, namely identical and isomor- 
phic components which are shared between composite annotation lattices need 
to be distinguished. In KOMET this is currently only possible on a clause by 
clause basis through variables which appear in several annotations in the clause. 

Also, purely from the viewpoint of the lattices involved, it would appear pos- 
sible to embed non-bilattices in bilattices, e.g. L = E®L < EOUTZ® L = B{L). 
However identifying E with {false, true} leads to the embedding true i-^- T, which 
is semantically unreasonable. This coincides with the fact that an AL program 
with bilattice annotations needs to be written very differently from an AL pro- 
gram with non-bilattice annotations and it would therefore be questionable to 
mix those two types of annotations in an AL program. A more reasonable, if not 
highly complex embedding would take the three-valued well-founded model of a 
general two-valued logic program as an element of an interpretation bilattice. 

3.2 Join-Irreducibility 

Recent work on AL which focused on the computational complexity of inference 
has brought the concept of join-irreducibility to the front (e.g. [16]). Represent- 
ing lattice elements as joins of join-irreducible elements has been recognized 
as helpful in reducing the computational complexity, therefore determining the 
number of such elements in a lattice and identifying the subset of all such ele- 
ments becomes an implementation concern. 

A lattice element a G L is join-irreducihle iff for any x,y G L, x \J y = a 
implies x = a or y = a. The set of join-irreducible elements of a lattice L is 
written JIR{L). 

Lemma 4 (Birkhoff). Every element of a distributive lattice with descending 
chain property has a unique non-redundant representation as the join of join- 
irreducible elements. 

The join-irreducible elements of elementary lattices are easily determined. 
In our situation, where lattices are built from the ground up from elementary 
lattices, the join-irreducible elements of some composites are also known. This 
knowledge is not exploited directly, since it does not appear to offer much advan- 
tage over the computation of normal forms based on the implementations of the 
component lattices. Elementary lattice implementations can be expected to take 
advantage of the join-irreducible representation of their elements. The follow- 
ing table lists some elementary and composite lattices and their join-irreducible 
subsets. 

In particular, if the complexity of operations in lattices is assumed to be 
0{\JIR{L)\), the complexity of FDLP operations is exponential in the num- 
ber of components. The same can be achieved by an implementation of FDLP 
using normal form computations and 0{\JIR{L)\) implementations of the com- 
ponents. 
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Lattice L 


JIR{L) 


\JIR{L)\ 


Powerset L — P{M) 


{{m}| m e M} 


\M\ 


Totally ordered set 


L-m 


\L\-l 


Cross product Li x . . . x L„ 


{(a;,_L..._L)|a;€ JlR(Li)} 

U - • • U {(J L,®) 1 x e J1R{L„)} 


\.HR(L,)\ 


FDLP Li ® • • • (g) L„ 


{xi ® ® Xn\Xi £ JlR{Li)} 


1 i;-i \JIR{L^)\ 


Raised lattice L U J_' < J_ 


JIR(L) U {_L} 


\JIR{L)\ + 1 



Table 1. Join-Irreducible Elements of Elementary and Composite Lattices 



The composition approach offered here does not suffer in situations where 
JIR{L) is not easily obtainable. For example, the join-irreducible elements of an 
order-reversed lattice L^ev would correspond to the meet-irreducible elements of 
L, which may in general be a completely different set. In our elementary lattices, 
however, the sets of meet- and join-irreducible elements are closely related. 



3.3 Constraint Representation of Composite Annotations 

We still need to show how composition and decomposition of lattice values is 
carried out at the clause (instance) level. For this it is necessary to discuss the 
additional SLG-AL concepts of constraints and modes. 

The original definition of lattice function instantiations as evaluable functions 
in GAPs was extended and reinterpreted in the KOMET version of SLG-AL 
through constraints. Besides lattices, built-in sorts (types for object variables) 
and connections with external information sources are represented as constraints 
within KOMET. 

Gonstraints, like predicates, have a number of arguments, each of a fixed 
sort, and a lattice annotation, n-ary functions are regarded as two-valued (i.e. 
annotated with TWO = {J_, T}) n-l- 1-ary constraints. 

Polymorphism is not used in KOMET, however constraints can be poly- 
modal. A mode of an n-ary constraint is an n-tuple of elements of the set 
{hound, unbound, any} . A constraint in a clause instance is evaluable accord- 
ing to a mode when every argument of type unbound is a variable and every 
bound argument is ground. Evaluation of a constraint results in one or more an- 
swer substitutions which bind the variable arguments. Modes offer substantial, 
flexible control over the order of subgoal evaluation. 

Returning to composite annotations, we again consider multiple annotations 
as a guiding example. If at the end of the evaluation of a subgoal p : Ai A . . . A A„, 
an answer p : a has been obtained and a = Vj=i Afc=i ^jk (where aji,Xi € Li), 
it is reasonable to return r answer substitutions, each of which extracts one 
of the conjunctive terms, i.e. {Xi —>■ Qji,i = l...n). More generally, a literal 
(partially instantiated) with a mixed variable and ground annotation, translated 
into FDLP notation, would look like p : u where u = A - • -Aaij, AAj^ A - • -AA^, 
and Qi, Xi € Li. If the term u includes more than one variable, a query may yield 
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multiple, mutually incomparable answers, corresponding to conjunctive terms of 
a disjunctive normal form. 

In KOMET, constant composite annotations can be specified using lattice 
class specific constructor syntaxes, variable composite annotations are obviously 
given as variable symbols, and mixed annotations are written using constructor 
constraints which are specific to each composite lattice type. The constructor 
constraints generally have two modes, one where all arguments are bound and 
the value is unbound (the constructor mode), and one where the value and 
some but not all arguments are bound and the remaining arguments are variable 
(deconstructor modes). As shown in the last paragraph, the FDLP deconstructor 
may return multiple answer substitutions. 

Finding a workable but fully flexible syntax is a challenge. For instance, if 
M = Ml (g) • • • (g) Mr, fV = fVi (g) • • • (g) fVs and L = Mi (g) • • • (g M^ (g fVi (g • • • (g) fV*, 
then also L = M ® N . One would like to be able to write a single variable for 
each part, to avoid the mentioned multiple answers which may just be reduced 
to one in another literal, among other reasons. But a syntax which allows this 
and also handles cases where the components are arranged differently in L is 
probably cumbersome. This problem has not yet been solved completely. 



4 Implementation 

As discussed in the introduction, annotated logic with strong negation and com- 
posite annotation lattices is a formalism well suited for mediators and similar 
applications. For this reason it was implemented in KOMET, a modular pro- 
gram with an extensible type system at the base, an implementation of SLG-AL 
resolution with constraints at its core and multiple object and lattice classes on 
top of these components. Concerns with speed and software engineering, partic- 
ularly the easier integration with common relational and object databases and 
more generally with arbitrary library and network interfaces led to the deci- 
sion not to implement it in Prolog like many more logic-centered systems, but in 
C-|— k, which was at the time (1994) chosen over Java for its maturity. KOMET ’s 
purpose is the investigation of mediator programming techniques and of the suit- 
ability of annotated logic for mediators. It is called a mediator shell because of 
its open architecture [4,5,6,12]. 

The first version of KOMET contained several simple lattices and the cross 
product, and the infrastructure to easily add more base lattices and lattice com- 
position operations. 

Of the lattice operations discussed in the previous section, only the FDLP 
presents notable challenges for an implementation. Elements are commonly rep- 
resented as polynomials, either disjunctions of meet-terms or conjunctions of 
join-terms. For the comparison of FDLP elements, criteria are available which 
are based on either normal form [9,20]. The normal form computation is therefore 
the key element of an FDLP implementation. 

A review of several dual transform algorithms of the matrix path search 
category (e.g. [17,18]) shows that after taking into account the aforementioned 
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differences between propositional or predicate logic on one hand and non-boolean 
distributive lattices on the other, many refinements are lost. Therefore our ex- 
ample implementation contains only a relatively simple algorithm of this class. 

A promising, very general dual transformation algorithm of a different type 
is presented in [2] . Its definition encompasses the FDLP case as well as proposi- 
tional and predicate logics. Its strength comes from the fact that it organizes the 
propagation of partial dual terms in such a manner that it will not generate any 
terms that can be subsumed, thus avoiding the usual step of finding and elimi- 
nating subsumed terms. It does generate large numbers of identical terms, which 
are guaranteed to be collected in one node without prior comparison but still 
need to be removed, e.g. by hashing and common subexpression detection. Pro- 
log provides this as part of most implementations, which facilitates the efficient 
implementation of this algorithm. As part of the work presented in this paper, 
the algorithm was implemented in a straightforward, non-optimized manner in 
C-|— I- within KOMET. That implementation could be used to verify the appli- 
cability of the algorithm to FDLPs. Its performance was inferior to the simple 
matrix path search algorithm, however this is most likely due to the nonoptimal 
implementation and not the algorithm itself, so no conclusion could be reached 
about its claimed efficiency. 

5 Example: Weighted Stable Models 

A two-valued logic program with negation may in general have more than one 
stable model. These models represent interpretations which are consistent with 
the program, but a two- valued proof procedure is not able to infer any preference 
among those models. A hypothetical mediator would pass a program P to an 
external prover with stable semantics, returning a three- valued answer substitu- 
tion of the result variables A, B, C for each of the models. Also, two weighting 
predicates assign preference to answers. 

Constraint StableProver :: Solve{String, THREE, THREE, THREE) : [TITO] 
Lattice RFOUR = RAISE(FOUR) 

Lattice Interpretation = FDLP{RFOUR, RFOUR, RFOUR) 

Lattice Weightedinterpretation = LEXPR{REAL01, REALOl, Interpretation) 
Predicate BestModel{void) : [Weightedinterpretation] 

Predicate Prefl{FOUR, FOUR) : [REALOl] 

Predicate Pref2{FOUR, FOUR) : [REALOl] 

Predicate Raise{THREE) : [RFOUR] 

BestModel{) : LEXPR{V1,V2, CTERM{RA, RB , RC)) ^ 

StableProver :: Solve{P,A,B,C) : [true], 

Prefl{A, B) : [Tl], Pref2{B, C) : [T2], 

Raise{A) : [RA\, Raise{B) : [RB], Raise{C) : [RC] 

The clause is first instantiated by answers from StableProver ::Solve because 
the other constraints are initially not evaluable. For each answer, two weights 
and three raised-lattice versions of the values of the P-predicates are obtained 
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as follows: Prefl, Pref2 and Raise pass information between the object and 
annotation domains, in this case taking object values and binding annotation 
variables. Raise also embeds THREE, which is not a bounded distributive lat- 
tice, in EOUR. These simple predicates can be defined through lists of facts, 
e.g. Prefl{t,f) : [0.5] ^ . Larger or changing tables might be implemented as 
constraints accessing external database tables. 

The resulting RFOUR values are combined into a FDLP term which repre- 
sents one model. CTERM (Conjunctive Term) is one of the constructor con- 
straints for the FDLP Interpretation (the other being DTERM, Disjunctive 
Term) . Its result is RA A RB A RC. These operations return one answer each so 
the search tree does not branch any further. The lexicographic product of the 
weights and the model is accumulated. Here, the constraint LEXPR is named 
identically to the lattice whose values it constructs. 

The reduction rule which is part of SLG-AL computes the least upper bound 
of all weighted models. If exactly one model has the highest weight, this model 
and its weight are retained. If more than one model has equal highest weight, all 
of these models which are incomparable by the definition of the stable semantics 
are kept as terms of an Interpretation value. 

6 Conclusion 

Based on an analysis of the requirements of annotated logic (AL) in mediator 
applications, we have presented an approach to complex distributive lattice an- 
notations in which large annotation lattices are built bottom-up from elementary 
lattices. The two new lattice composition operations introduced are the free dis- 
tributive lattice product (FDLP) and the lexicographic product. The FDLP is a 
general solution to the multiple annotation problem, whereas the lexicographic 
product allows the selection of optimal answers. The complexity of an AL system 
is not affected prohibitively by the introduction of large numbers of component 
lattices into an AL program, as long as locally, i.e. within each literal, only a 
small number of the components are nontrivial. 

Compared with results regarding join-irreducible representations (JIR) of 
lattice elements [16], the computational complexity of our approach is shown to 
be similar to the JIR approach. Furthermore the methods presented here do not 
require the join-irreducible elements of a lattice to be known. 

All of the results presented here affect only the lattice component in an AL 
system. Therefore, they should transfer easily to lattice-based AL systems other 
than traditional GAPs, such as coherent well-founded AL programs [8] which 
have a different notion of default negation. 
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Abstract. In this paper we describe a first-order logic inference strategy 
based on information extracted from both conjunctive and disjunctive 
normal forms. We claim that the search problem for a proof can bene- 
fit from this further information, extending the heuristic possibilities of 
resolution and connection proof methods. 

Keywords: First-order logic, theorem proving, inference strategy, dual 
transformation. 

Topic: Logic and Symbolic Computing. 



1 Introduction 

There are two main families of automatic theorem proving systems: the sys- 
tems based on the Resolution rule [17] and the systems based on the Connection 
Method [2]. Resolution is the best known and most widely used method for the- 
orem proving. In recent years, several proof strategies have been proposed to 
improve its efficiency [11] and much of the development in expert systems [22] 
and logic programming [18] has been strongly influenced by it. Before a resolu- 
tion method is applied, the negation of the theorem to be proved, along with the 
appropriate hypothesis, must be converted to conjunctive normal form. Resolu- 
tion methods are characterized by a local inference rule - the resolution rule - 
able to generate new clauses - the resolvents - which are logical consequences of 
the clauses already admitted. The termination criterion is the generation of the 
empty clause. Its main disadvantage is that it retains the (non subsumed) newly 
inferred clauses, augmenting the search space at each successful application of 
the resolution rule. 

The connection method has its roots in the Semantic Tableaux [20] and Nat- 
ural Deduction methods [3] and inspired several theorem proving methods (e.g. 
the Consolution Method [10]). Although some methods of the connection family 
work on formulas represented in normal form (e.g., [1]), in general they can be 
applied on formulas expressed in full first-order logic language. The termination 
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criterion is the generation of a spanning complementary mating of the set of 
formulas. A mating is a set of connections, where a connection is an unordered 
pair of literals with the same predicate symbol but different signs. A connec- 
tion is complementary if the two atomic formulas occurring in its literals unify. 
Finally, a complementary mating spans a set of formulas if each path through 
the formula literals contains a connection from the mating. Connection methods 
are high level proof methods, in the sense that they search for a global prop- 
erty of the set of formulas, the spanning matings. Differently from the resolution 
methods, no inferred formula is retained during the deduction process, although, 
for first-order logic, the set of formulas must be expanded (or amplified) by the 
duplication of some of its formulas. But this operation can be performed using 
only indices and not actually duplicating the formulas. 

The proposed inference method for first-order logic has some characteristics 
that combine resolution and connection methods features. On the one hand, the 
proposed method presents the following properties in common with resolution: 
(i) it demands the problem to be transformed to a normal form, in fact to both 
dual normal forms, (ii) it retains (non subsumed) inferred theorems, (iii) it sup- 
ports a refutation-based theorem proving method, and (iv) it is a local process. 
On the other hand, the proposed method presents the following properties in 
common with the connection method: (i) its proof strategy is based on the com- 
bination of substitutions associated with “connections”, i.e., substitutions that 
unify two complementary literals in different clauses, (ii) it treats linear chains 
quite efficiently, (iii) it supports an affirmative theorem proving method, and 
(iv) it is a global process. 

The particularity of the proposed method comes from the apparent paradox 
between the last properties of the above two lists. This paradox is explained 
by the “holographic” character of the proposed method. On the one hand, the 
atomic goals are identified through a local process applied to the conjunctive 
normal form. This process is analogous to the choice of candidate clauses in the 
resolution method and therefore can benefit from strategies such as the set of 
support [7]. On the other hand, once the goals are defined in the conjunctive 
normal form, the substitutions to be applied are calculated taking into account 
global properties about the contradictory character of the dual clauses that be- 
long to the disjunctive normal form, analogously to the connection method. 
This second aspect of the inference process can benefit from the linear chains 
and hinged loops treatment available in the connection method [3]. 

The main idea of the proposed inference method is to use the information 
about the occurrence of literals within clauses and dual clauses, along with the 
subsumption relation among them, to guide the inference process, when a specific 
goal is given. Intuitively, if we have a clause that contains a literal that unifies 
with the given goal, then it is enough to eliminate all other literals of the clause 
and the goal will be proved, provided that all the involved substitutions combine. 
To eliminated some literal from a clause, we have to find a substitution that turns 
into contradictions all the dual clauses that are represented, in that clause, by 
the given literal. 
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The paper is organized as follows. In Section 2, we present the adopted rep- 
resentation for conjunctive and disjunctive normal forms, where the relations 
between literals that occur in both forms are explicitly stored. In Section 3, the 
proposed proof strategy is described. In Section 4, some results of the strategy 
implementation are presented. Finally, in Section 5, we draw some conclusions 
and comment upon future perspectives. 



2 Dual Representation 

Consider the first-order language L{P, F,C), where P, F and C are finite or 
countable sets of predicate, function and constant symbols, respectively. Follow- 
ing the usual definition of terms, atomic formulas, and formulas (e.g., [12]), and 
given the formulas X\, ..., we define a generalized disjunction and a general- 
ized conjunction as [Xi, ..., Xn] = V ... V and (Xi, ..., X„) = A ... A Xn 
respectively. A literal is an atomic formula or the negation of an atomic formula, 
or one of the constants True or False. A clause is a generalized disjunction in 
which each member is a literal. A dual clause is a generalized conjunction in 
which each member is a literal. 

A first-order formula Wc is in conjunctive normal form or is in clause form 
if it is a generalized conjunction {Ci, ...,Cn) in which each member is a clause. 
A first-order formula W a is in disjunctive normal form or is in dual clause form 
if it is a generalized disjunction [Di, Dn] in which each member is a dual 
clause. Given an ordinary formula W, i.e., one not restricted to generalized 
conjunctions and generalized disjunctions, there are algorithms for converting it 
into a formula Wc, in clause form, and into a formula Wd, in dual clause form, 
such that W Wc Wd (e.g., [16], [19], [21]). To transform a formula from 
one clause form to the other, what we here call the dual transformation, only 
the distributivity of the logical operators V and A is needed. 

The proposed proof procedure needs, beside the two canonical forms Wc and 
Wd, some information about the relation between the literals in one form and 
the literals in the other form. The clauses in Wc and the dual clauses in Wd are 
a kind of “holographic” representation of each other. Each clause in Wc consists 
of a combination of all dual clauses in Wd and, conversely, each dual clause in 
Wd consists of a combination of all clauses in Wc- They are combinations in the 
sense that each literal in a clause belongs to a different dual clause. If all literals 
in the clause set are ground than each dual clause will contain exactly one literal 
of each clause, but if we have variables and some literals subsume some others, 
than a single literal in one dual clause may represent more than one clause. This 
representation relation that must be captured for our purposes. This is done 
through the introduction of the notion of quantum^ . 

A quantum is a mathematical object that consists of three elements: a literal 
(j) and two sets of integers F,S. If the quantum belongs to Wc, its set F of fixed 

^ The metaphoric notations adopted to name the defined mathematical objects (quan- 
tum, coordinates, etc.) are intended to facilitate the understanding of the algorithm 
and do not have any further significance. 
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coordinates indicates which dual clauses contain the literal (j) in their definition. 
The set S of subsumed coordinates indicates which dual clauses contain (in their 
definitions) literals 4> such that (j) subsumes (f>' . A quantum is noted by . The 
representation is symmetric, i.e., the F and S sets associated with the quanta 
that belong to dual clauses represented the information about the presence of 
the respective literals (and their subsumed literals) in the clause form. 

The dual transformation is a very expensive procedure and can be traced 
back to Quine [15, 16], who first proposed an algorithm to solve the problem 
of reducing an arbitrary truth-functional formula to a shortest equivalent in 
normal form. This problem, also referred to in the literature as the prime impli- 
cants/implicates determination problem, finds applications in the minimization 
of switching circuits [23] and has since been the subject of several publications 
[19, 13, 14, 21]. The proposed representation is not only useful in the framework 
of the proposed proof strategy, but can also be used as a base for an efficient 
procedure to calculate the dual transformation [5] . This procedure is part of the 
theorem proving system whose strategy is presented below. 



3 Strategy 

The proposed strategy can be divided into three steps. Initially, it is necessary 
to determine which dual clauses are eliminated by each substitution that unify 
atomic formulas of literals occurring in different clauses. Next, for each literal, it 
is necessary to determine all the different ways that it can be eliminated, i.e., all 
the different ways the dual clauses it represents can be turned into contradictions. 
This part of the calculation depends only on the theory and not on the specific 
goal to be proven. Finally, given a goal to be proven and the clauses containing 
literals that unify with its negation, it is necessary to combine the different ways 
the other literals of the clauses can be eliminated. These three steps are presented 
in the following subsections. 

3.1 Elimination Set 

Given both conjunctive and disjunctive normal forms of a theory, it is necessary 
to calculate the set: 0 = {{9,P0,Eg)}. It contains all substitutions that are 
able to turn into explicit contradictions one or more dual clauses in Wd- Each 
element of the set 0 contains the following elements: (i) 0 - a substitution, (ii) 
Pg = ’’^ 0 } “ the set of pairs of quanta in the conjunctive normal form 

of the theory such that: (j)i6 = and (iii) Eg - a set of integers containing 

the numbers of the dual clauses eliminated^ by 0. 

The problem is how to calculate Eg given Pg, the set of conjunctive normal 
form quanta pairs. Clearly, we must have that: \J^{Pi H F)') C Eg because, by 
definition of the set F, both literals - (j)i and (j)^ - are present in all dual clauses 

^ We use the terms kill and eliminate as abbreviations of “turn into explicit contra- 
dictions” . 
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k, such that k G Ui{Fi n F^). But there are cases where the S sets should also 
play a role; this happens when (f>i or (/)' are ground. The final expression for Eg 

is: Eg = ^i) UieG.i^G' Ui^G.ieG' (-^i 

^i)) U UieG U 5i) n {F' U S'-)) where G contains the indices of the quanta 

containing ground or linear^ literals in the left hand side of the pairs in Pg and 
G' contains the indices of the quanta containing ground or linear literals in the 
right hand side of these pairs. 

The reason why we can include these extra dual clauses into the set of dual 
clauses that 9 is able to eliminate is the following. Suppose a dual clause k and 

F- S' F' s' 

a pair ^) G Pg where (pj is ground or linear (i.e., j G G), is not 

ground (i.e., j ^ G'), k G Sj and k G F-. In this case, dual clause k contains 
literal because k G Fj, but does not contain literal (pj, because it is only 
present in dual clauses whose numbers are in Fj. But, by the definition of the 
set S, there is surely some non ground literal ip{x) in clause k and substitution 
cr, such that (p{x)a = (pj, where x stands for the set of variables that occur 
in the literal Lp. So we can write dual clause k as: (. . . ,(p'^,ip{x), . . .) and, if 
we apply 6 to it, we obtain: (. . . , (p'j9, ip{x)6, . . .) which might not be explicitly 
contradictory because we only have that: <pj = ~^(p'jO and not that (pj = ^ip{x)6. 

But we can freely introduce a (pj in dual clause k, before we apply 9 to it. The 
reason why we can do that is the idempotency of the A operator, even inside a dis- 
junctive formula: [T'i[a;],... ,Fn[x],{(p{x),...)] ^ [Fi[x],... ,Fn[x],{(p{x),<p{y), 
. . . )] given that the set y of variables contains only new variables that do not 
appear anywhere else. It is easy to see, by the definition of the set S, that there 
is a substitution to such that: <p{y)uj = (pj. If we apply this substitution u> to 
the modified dual clause, we obtain: (. . . , (p'j,(p{x), (pj, . . .) which is equivalent 
to the original dual clause k but, nevertheless, is contradictory under 9. 

The most interesting point is that we do not need to transform the substitu- 
tion 9 into 9uo in the set O, because, as the variables in lv do not occur anywhere 
else in the problem, they have no effect in the global solution (in this case, what 
we are looking for). 

Example 1. Consider the theory: 



Wc = 
Wd = 



[Pi(xo)^^’^>'®,Q(/(xo)){°'^>’®], 1 : hQ(/(a))^^’^>'^°’^>], 

hQ(xi){°’i>'0,P2(:ri){2'3}.0]) 

(Q(/(^o))^°^’®,-Q(xi){^>’{i>), 1 : (Pi(xo)^°>’®,-g(:ri){2}.{i}) 

(P2(a:i){2}.0, ^g(/(a)){i>’0, g(/(a:o))^°>’®), 

(Pi(:ro){°>’®,P2(a:i){2>’®,-g(/(a)){i>’®)] 



Here, in this case, the set 0 contains the elements: {{xi/ f{xo)}, {(^g(a:i)^°’^^’®, 
g(/(xo)){°’2}.0)},{o}) and ({xo/a}, { ( -g(/(a)){2.3}.{o.i}, q(j(^„)){o. 2 }. 0 )|^ 
{0,2}). The first element corresponds to the simple case where both literals 
in the pair are non ground, and the eliminated dual clause is just that where 

3 We call a non ground literal linear in a clause if its variables don’t occur anywhere 
else in the clause (and because of renaming of variables, also in the theory). 
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both are present (dual clause 0). The pair of the second element presents one 
ground literal ^Q(/(a)) and the associated quantum has a non empty S set - 
{0,1}. In this case, additionally to the dual clause where both literals of the 
pair are present (dual clause 2), the substitution also eliminates dual clause 
0- {Q{f{xo)),^Q{xi)), because this dual clause is equivalent to the following 
dual clause: {Q{f{xo)), -^Q{xi),^Q{y)). And, if we apply the substitution oj = 
{y/ f{a)} to this clause, we obtain: {Q{f{xo)), ~^Q{xi) , ^Q{f (a))) which becomes 
contradictory after the application of the substitution 9 = |a;o/a}: (Q(/(a)), 
^Q(xi), -^Q{f{a))). It should be noted that these results would be exactly the 
same if the literal in the second clause - ^Q(/(a)) - was linear, e.g., ~^Q{f{z)), 
instead of being ground. 

In this same case, where 4>j is a ground literal and is a non ground and non 
linear one, it should be noted that the dual clause numbers in the S set associated 
with the non ground literal - - are not taken into account in the calculation of 

the elimination set. This is because, to apply the same duplication trick to this 
literal, we have to identify a literal ‘p'ix) that subsumes and that is present 
in dual clause k. But if we duplicate this literal, obtaining y}\z), and apply to 
it a suitable substitution uj' , we will obtain, as expected, literal which has 
variables that are not linear in the dual clause set (because, by hypothesis, (/)' is 
non ground and non linear) and this would change the semantics of the theory. 
Nevertheless, the S set in this case has a role to play in the proposed strategy 
(see example 3 in section 3.2). If both (pj and (/)' were non ground literals, then 
ip{y)uj9 and (p'j9 would be identical and non ground, therefore they would contain 
the same variables. But a dual clause with two literals containing the same 
variables has no meaning, because each literal must represent different clauses 
and therefore, because of renaming of variables, must have distinct variables. If 
we transform the modified theory to conjunctive normal form and rename the 
variables in the clauses, we will obtain a theory with a different semantic than 
the original one. 



Example 2. Consider now the theory: 



Wc = 



Wa = 



0 

2 

0 

1 

2 



lQ(f(xo))^°’^’^^’% 1 : hQ(/(xi)){i>’{°>,Pi(:ri){2}.0], 

(P2(x2)^"^’®,-g(/(a:i))^'>’®,g(/(a:o))^°>’®), 

(Pi(xi){i>’0, F2(x2)^^^’^Q(f(xo))^°^’^} ] 



In this case, the set 0 contains the elements: ( |a: 2 //(a:o)}) { (^Q(x 2 )^^^'^, 
Q(/(a^o))^°’^’^^’®)}, {0}), and ({xi/xq}, 

{!}). The first element corresponds again to the simple case where both literals 
in the pair are non ground and have empty S sets. Then the eliminated dual 
clause is just that where both are present (dual clause 0). The pair in the second 
element presents one literal ^Q(f(xi)) with a non empty S set - {0} -, but now 
it is non ground and non linear. If we try to apply the literal duplication trick to 
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dual clause 0, we obtain: {Q{f (xq)) , ^Q{x 2 ) , ^Q{y)) ■ And, if we apply the substi- 
tution oj = {y/f{xo)} to this clause, we obtain: {Q{f{xo)),^Q{x 2 ),^Q{f{xo))) 
which is indeed contradictory, but is non linear with respect to variable xq, 
something that is not allowed in a dual clause. 



3.2 Elimination Graph 

Suppose a unitary goal ^(p is to be proven, given a theory represented by both 
normal forms Wc and Wd- Initially, we look for the set of clauses in Wc that 
contain one quantum such that there exists a substitution 0 that verifies: 
00 = ipO. Call this set C^, and consider one of the clauses belonging to it: 

Each of the quanta in this clause represents, through its F and S sets, some 
set of dual clauses in Wd- If we find a substitution, call it cr, that turns into 
contradictions all the dual clauses whose numbers are in the set U St), 

than all the quanta 0^ *’ % f = 1, . . . , fc can be eliminated from the clause. If we 
apply substitution a to Wc and Wd, we obtain a new theory where the original 
clause has been reduced to: 

If it is possible to combine substitutions 9 and a, then we have: 00cr = tpOa 
and the substitution 9a is an answer to the given goal. Otherwise, 0cr can be used 
as a new hypothesis that should be incorporated into the theory. Therefore, the 
proposed strategy is designed to find an adequate combination of substitutions, 
belonging to the elements of the set 0, that eliminates all dual clauses k G 
U^^i(Fi U Si), for each clause in the set C<^. 

Analogously to all proof procedures, what we have is a standard state space 
search problem. The originality of the proposed strategy can be stated in two 
points: (i) the information contained in the set 0 comes from both normal forms, 
allowing a combination of techniques from the resolution and connection fam- 
ilies of theorem proving methods, and (ii) the search for the set of combined 
substitutions that eliminate one literal in a clause is independent of the eventual 
goal to be proven and, given a theory, can be performed only once and stored 
for later use. 

Consider first the problem of eliminating one literal in a clause. In this case 
we have two types of information to explore. On the one hand, the substitutions 
to be combined should span an acyclic graph in the conjunctive normal form, 
with the root node in the clause to which the literal to be eliminated belongs. 
The elements in the 0 set contains the necessary information to construct such 
graph in their lists of pairs. In fact, each pair contains two quanta, and although 
these are conjunctive normal form quanta, whose F and S sets contain infor- 
mation about the disjunctive normal form, their correspondent quanta in the 
disjunctive normal form (which we call their mirror quanta) contain in their F 
and S sets the information about the clauses where the respective literals occur. 
This information reduces drastically the number of elements of the 0 set that 
should be tested, because we just have to consider those that connect the present 
clause to another one. 
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On the other hand, the F and S sets of the quanta that contains the literal to 
be eliminated determine the set of numbers that designate the dual clauses that 
should be eliminated. Each element in the set 0 also contains a set of numbers 
associated with the dual clauses it eliminates. The match between these two 
sets of numbers can also be used to reduce the search space of 0 elements 
whose substitutions should be tested for combination. Elements of 0 that don’t 
eliminate any of the dual clauses we want to kill should not be considered in the 
search. 

The search problem can be defined as the process of combining graph frag- 
ments where the nodes of each of these graph fragments are labelled with clause 
numbers and the edges with elements of the set 0. Formally, each of these frag- 
ments is represented by a pair (E,G), where the set E contains the number of 
the dual clauses eliminated by the fragment and G = {N, A) is a graph with N , 
the set of nodes, a subset of the set of clause numbers and A, the set of edges, 
given by tuples (ni,n 2 ,ce) such that ni and ri 2 are clause numbers and eg € 0. 

Initially, it is necessary to construct a set of basic fragments. The basic frag- 
ments are constructed from the elements of 0. The dual clauses eliminated by 
each basic fragment are the same eliminated by the associated 0 element, and 
the graph associated with it is constructed according to the pair set of the 0 
element. Each pair in this set gives rise to one or more edges, depending on the E 
sets of the mirror quanta associated with the quanta in the pair. More formally, 
given eg = (0,Pg,Eg) € 0 with Pg = )}, let 

be the set of mirror quanta associated with the quanta in Pg. In this case, eg 
gives rise to the following basic fragment: {Eg, {Em U Em, {(n, m, eg), {m, n, eg) \ 
n G Em and m G Em})). 

If there are more than one element in 0 that eliminate the same set of dual 
clauses, i.e., that have the same Eg set, we combine their graphs together, by 
making the union of the node and edge sets, respectively, and generate just one 
basic fragment. This further reduces the search space. In the basic fragment 
graphs, all edges are bidirectional, except those that have one node associated 
with a ground clause, in this case the edge is directed to the ground clause. 

A special case occurs when, in one element oi Pg, (p is ground, (p' is not ground 
and the intersection of F U S' and S' is not empty. In this case, it is necessary 
to find all literals if such that if subsumes <p' . Let be their associated 

quanta and respective mirror quanta. For each (p, we should 

verify if the intersection: (F U S) n S' n F<^ is not empty and, if it is the case, 
then we should include the following edges in the graph associated with the basic 
fragment: {(n, m, eg), {m, n, eg) \ n G Fm and m G F,p^m}- 
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Example 3. Consider the theory given by: 



Wc 



Wd 



0 

1 

2 

0 

1 

2 



[p(;(„)){0.2},{l}]^ 



In this case, the set 0 contains the elements: ( {xi/f{a)},{ ( 
P(/(a)){o,2}.{i})}^{0}) and ( Wa}, p(/(a)){o.2}.{i})) }, 

{0,2}). The second element of the set 0 is an instance of this special case. 
There we have (j)'^ = 

and jjj case, we have: F n S" fl 

F(p = {0,2}n{0}n{0| = {0} and we should include the following edges: 

1(1, 2, C{2,g/(j})) (2, 1, C{a;o/a})}- 



The search begins with the basic fragments that contain, in the pair set of the 
0 elements occurring in the edges of their graph, quanta associated with the 
literal we want to eliminate. These basic fragments are joined together into a 
general fragment. The only difference between basic and general fragments is 
that general fragment graphs only have unidirectional edges. Initially, all the 
edges begin in the clause associated with the literal we want to eliminate, the 
ends of all these edges determine the fringe of the graph. 

Given a general fragment, the search proceeds by looking into the basic frag- 
ment set for one that has a graph with a node labelled with the same clause 
as those labelling the fringe nodes of the graph in the given general fragment. 
Augmenting the graph only from its fringe nodes guarantees that we will get an 
acyclic graph as a result. Once an adequate basic fragment is found its graph is 
joined with the graph in the given general fragment, if the substitutions asso- 
ciated with the 0 elements in neighbor edges combine. It is important to note 
that, when an adequate basic fragment is found, only the edges of its graph 
that begin in clauses of the fringe of the graph in the current general fragment 
are included, keeping the general fragment graph acyclic and only containing 
unidirectional edges. 

Each graph may store several different substitutions, because only the substi- 
tutions in the same path can be combined together. Another advantage is that 
the complete substitutions associated with each path don’t need to be stored 
during the search; it is only necessary that the substitution associated with a 
new edge combine with the substitution in the previous edge. This is so because, 
the graph being acyclic, the paths in it never return to the same clause and 
therefore variables are never repeated along one path. 

Besides the fact that we only choose basic fragments that lead to the expan- 
sion of the fringe nodes of the graph, the search for basic fragments is further 
restricted by the number of dual clauses that these basic fragments eliminate. 
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Once an adequate basic fragment is found and its graph is joined with the 
graph of the given general fragment, a new general fragment is generated. These 
new general fragments will eliminate dual clauses corresponding to the union 
of those eliminated by the given general fragment and those eliminated by the 
basic fragment found. From this new general fragment the search can proceed, 
until a general fragment is generated that kills all the dual clauses we want to 
eliminate, finishing the search. 

Example 4- Consider the theory: 



VFc = (0 
2 

3 

4 

Wd = [0 
1 
2 

3 

4 

5 



[P(a){°’l’2’3.4.5}.0]^ 1 : [P(5){O.1.2.3,4.5},0]^ 

(p(a){O}.0, p(5){l}.0, 

(Q(/(a^o))^'^’®, 

(P(a){o}.0^ p(5){i}.0, ^P(xo)^2}.{3}^ 

(P(a){o}.0^ p(g{i}.0, p(a:i){3}.0, ^g(/(a.2)){4}.0, g(/(^„)){2}.0) 
(P(a){o}.0, p(g{i}.0, p(a;,){3}.0, Q(/(a:o))^"^’®) 

(P(a){o}.0, p(g{i}.0, ^P(/(a:i)){3}.0, Q(/(xo))^2>’®) ] 



In this case, the set 0 contains the elements: 

Cl = (Wa},{(^P(a:o){°’2}.0,P(a)O4.2.3.4.5}.0)}^|O^2}), 

C2 = ({xo/6},{(-P(a:o){°’2}.0,P(5){O4.2.3.4.5}.0)}^|O,2}) and 
C3 = {{X2/X0}, {(-Q(/(:T2)){O'b3}.0^ g(^(^^)){1.3.4.5},0)}^ |g 3 }), 

These three elements give rise to two basic fragments, because we join in a 
single basic fragment all the substitutions that eliminate the same set of dual 
clauses. The two basic fragments (see figure 1 (a) and (b)) are given by: 
h = ({0, 2}, ({0, 1, 2}, {(2, 0, Cl), (2, 1, C 2 )})) and 
h = ({1, 3}, ({2, 4}, {(2, 4, C3), (4, 2, cg)})). 

Given a goal ip = ~^R{y), the set C^p would contain clauses 3 and 4. To 
eliminate literal ^P{f{xi)) in clause 3, it would be necessary to eliminate dual 
clauses {0,1,2, 5}, according to the F and S sets of the quantum associated 
with it. It is easy to see that no combination of elements of 0 can kill these dual 
clauses. On the other hand, the literal ^Q{f{x 2 )) in clause 4 can be eliminated. 
The search begins with fragment /2, because 0 element C3, which occurs in the 
edge of its graph, contains, in its pair set, the quantum ^Q(/(a:2))^3’^’3^’® associ- 
ated with the literal ^Q{f{x 2 )) we want to kill. In this case the search is trivial 
and the solution is given by the following general fragment (see figure 1 (c)): 
({0, 1, 2, 3}, ({0, 1, 2,4}, 1(4, 2, C 3 ), (2, 0, ci), (2, 1, C 2 )})) which is the combination 
of basic fragments /i and /2 and eliminates dual clauses (0, 1, 2, 3}. The result- 
ing substitutions are obtained by combining the goal substitution 0 = {y/x 2 } 
with the substitutions in the 0 elements in each of the paths in the graph of the 
solution fragment: {y j X 2 , X 2 f xq, xq f a} and {y/a;2, a^2/a^0; 
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0 

1 



(a) 





Fig. 1. Graphs: (a) fragment /i, (b) fragment / 2 , (c) solution fragment 



3.3 Combining Elimination Graphs 

Consider the following clause containing the literal (j) we want to eliminate: 
, ■ ■ ■ , j . If, differently from the examples presented in the previ- 

ous section, A: > 1, then the elimination graphs associated with literals (pi, . ,(pk 
must be combined in a single elimination graph. This combined elimination graph 
differs from the elimination graphs associated with single fragments in that, in 
the latter, each path corresponds to one substitution and, in the former, a single 
substitution may be represented by more than one path, because of this, com- 
bined elimination graphs have more than one fringe, each fringe associated with 
one substitution. 

The simplest case is when the graphs to be combined involve different vari- 
ables. In this case, the graphs have only to be merged together. The only diffi- 
culty is the definition of the set of fringes: each fringe of the combined graph is 
composed by exactly one path of each of the elimination graphs to be combined. 



(a) 



(c) 



{X2/X(j}, {3} 
{xo/a}, {0, 1} 



{ x 2 / xq }, {3} 



{xi/c}, {0- 2} 

{xi/c}, {0, {xi/fe}, {0, 2} 




{xo/a}, {0, 1} 

fringe 1 Iri 

.. fringe 2 

Fig. 2. Elimination Graphs: (a) (b) (c) combined graph 



Example 5. Consider the following theory: 



VFe = (0 
1 



[p(a){O,l,2.3.4.5},0] 



2 

3 

4 

5 



[g(c){O.1.2.3,4.5},0j 

hP(:ro){°’i>'®,P(/(3:o)){^’3,4,5}.0] 

hP(/(x2)){3}.{0.1}, i?(x2, a:3){3}.0] ) 




A Proof Strategy Based on a Dual Representation 



Wd = l 

1 : (P(a)W’0,g(6){i}'0,Q(c){2}.0,g(/(xi)){4}.0,^P(a;o){3},{5}) 

2 : (Q(6){i}'0,Q(c){2}.0,p(a)m.0,p(/(a;o)){3}.0,^g(a;i){">>{®>) 

3 : Q(c){2}.0, g(/(xi)){4}.0, P(/(xo)){3}.0, P(a)W'0) 

4 : (Q(6){i>’®, Q(c){2}.0, g(/(a;i)){4>’®, -Q(/(x3)){5>.0, P(/(:eo)){ 3}.0, P(a){o}’®) 

5 : (P(x2,X3){5}-0,g(/(:ri)){4}.0,g(c){2}.0,g(5){i}.0,p(a){o}.0^p(j(^^)){3}.0^ j 

and a goal to be proven of the form ^R{a,y). In this case, we should eliminate 
the literals ~^P{f{x 2 )) and The search for elimination fragments for 

these two literals results in two fragments. The first one, see figure 2 (a), has 
a graph with only one path and the other, see figure 2 (b), has a graph with 
two paths. The combination is shown in figure 2 (c); it has two fringes (1 and 
2) which correspond to two different solutions: {x^j x\,X 2 l xq^xq! a^xifb} and 
{x:ilxi,X2lxo, Xo/a, Xi/c}. 

If the graphs to be combined share variables, there are two possibilities: if the 
variables in each of the edges contain all the variables of a given clause, then 
these edges can be maintained in parallel, because they represent two different 
instances of the covered clause. Otherwise the substitutions in each edge should 
be combined and the parallel edges should be considered as part of the same 
path. 



(a) 3 
2 
0 



' (b) 3 f 

( {xo/a},{0,l,2,4} ' ' {xo/b},{0,l,2,3} 

( {xo/a}, {0, 1, 2, 4} {xo/b}, {0, 1,2,3} 

(=) {xo/aj, {0, 1, 2, 4} {xo/6|, {0, 1,2,3} 

{xo/a}, {0, 1, 2, 4} /^*^{xo/6}, {0, 1, 2, 3} 

0^ 2 ^ 



Fig. 3. Elimination Graphs: (a) ~^Q{a), (b) ~^Q{b), (c) combined graph 



Example 6. Consider the theory given by: 



IF, = (0 
1 



Wd = [ 



[p(^){O,l,2,3,4,5},0]^ 

[P(6){°’1’"’3.4.5},0]^ 

hg(a){"’4>'®, -g(6){i'3}'0, P(c){°’5>'0] ) 

-p(xo)^^^’®, P(6)^^>’®, P(a)^°>’®), 
(-P(:ro)^"^’®, -g(6)^^>'®, P(6)^i>’®, P(a){°>’®), 
(-P(a:o)^"^’®, -g(a){3}.0, P(6)F}.0^ P(a)^®>’®), 
(g(a:o)^2>’®, -g(6)^">’®, P(6){i>’®, P(a){®>’®), 
(g(:ro)^2>’®, -g(a){3}.0, P(5){i}.0^ P(a){®>’®), 
(P(6){i>’®, g(a:o)^"^’®, P(c){3}.0, p(a){0}’®) ] 
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and a goal to be proven of the form ^R{c). In this case, we should eliminate 
the literals ^Q{a) and ^Q{b). The search for elimination fragments for these 
two literals results in two fragments and each of them has only one path (see 
figure 3 (a) and (b). The combination is shown in figure 3 (c), it has one fringe 
that correspond to two different solutions: {xq/o} and {xo/h\. 

4 Results 

To test the proposed strategy, the LOGIK system has been implemented. The 
system is an object-oriented laboratory for first-order logic, written in Common 
Lisp/CLOS, which includes the proposed proof strategy and a dual transforma- 
tion algorithm. All entities of the logical syntax - variables, functions, predicates, 
terms, literals, clauses, dual clauses and substitutions - have been implemented 
as classes with their associated manipulation methods - substitution, unification 
and subsumption. To test the strategy, we used examples from the TPTP"^ prob- 
lem library. The results obtained, even with this experimental implementation 
where no special concern was taken over performance, are promising. 

5 Conclusion 

We have presented a proof strategy for first-order logic based on the complemen- 
tary information that can be extracted from both conjunctive and disjunctive 
normal forms and mainly from the relations between literals that appear in each 
one of them. This strategy integrates a theorem proving system [6] that is part 
of a more ambitious cognitive modeling project [4], [9], [8]. The originality of the 
proposed strategy lies in the fact that, because it uses the information contained 
in both normal forms, it allows a combination of techniques from the resolution 
and connection families of theorem proving methods. Besides that, the results 
obtained during the search for combined substitutions that eliminate each lit- 
eral in a clause are independent of the eventual theorem to be proven and, given 
a theory, need be performed only once and stored for later use. The proposed 
strategy has been implemented and tested, showing promising results. 
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Abstract. We present an application of the ACL2 theorem prover to 
formalize and reason abont rewrite systems theory. This can be seen 
as a first approach to apply formal methods, nsing ACL2, to the de- 
sign of symbolic computation systems, since the notion of rewriting or 
simplification is nbiqnitons in such systems. We concentrate here on for- 
malization and representation aspects of abstract reduction and term 
rewriting systems, using the first-order, quantifier- free ACL2 logic based 
on Common Lisp. 



1 Introduction 

We report in this paper the status of our work on the application of the ACL2 
theorem prover to reason about rewrite systems theory: confluence, local con- 
fluence, noetherianity, normal forms and other related concepts have been for- 
malized in the ACL2 logic and some results about abstract reduction relations 
and term rewriting systems have been mechanically proved, including Newman’s 
lemma and the Knuth-Bendix critical pair theorem. 

ACL2 is both a logic and a mechanical theorem proving system supporting 
it. The ACL2 logic is an existentially quantifier- free, first-order logic with equal- 
ity. ACL2 is also a programming language, an applicative subset of Common 
Lisp. The system evolved from the Boyer-Moore theorem prover, also known as 
Nqthm. 

A formal proof using a theorem proving environment provides not only formal 
verification of mathematical theories, but allows us to understand and examine 
their theorems with much greater detail, rigor and clarity. On the other hand, the 
notion of rewriting or simplification is a crucial component in symbolic computa- 
tion: simplification procedures are needed to transform complex objects in order 
to obtain equivalent but simpler objects and to compute unique representations 
for equivalence classes (see, for example, [4] or [9]). 

* This work has been supported by DGES/MEC: Projects PB96-0098-C04-04 and 
PB96-1345 
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Since ACL2 is also a programming language, this work can be seen as a 
first step to obtain verified executable Common Lisp code for components of 
symbolic computation systems. Although a fully verified implementation of such 
a system is currently beyond our possibilities, several basic algorithms can be 
mechanically “certified” and integrated as part of the whole system. 

We also show here how a weak logic like the ACL2 logic (no quantification, no 
infinite objects, no higher order variables, etc.) can be used to represent, formal- 
ize, and mechanically prove non-trivial theorems. In this paper, we place empha- 
sis on describing the formalization and representation aspects of our work. Due to 
the lack of space, we will skip details of the mechanical proofs. The complete in- 
formation is available on the web at http : //www-cs .us . es/~ jruiz/ acl2-rewr. 

1.1 The ACL2 System 

We briefly describe here the ACL2 theorem prover and its logic. The best in- 
troduction to ACL2 is [6]. To obtain more background on ACL2, see the ACL2 
user’s manual in [7]. A description of the main proof techniques used in Nqthm, 
also used in ACL2, can be found in [3]. 

ACL2 stands for A Computational Logic for Applicative Common Lisp. The 
ACL2 logic is a quantifier-free, first-order logic with equality, describing an ap- 
plicative subset of Common Lisp. The syntax of terms is that of Common Lisp 
[14] (we will assume that the reader is familiar with this language). The logic 
includes axioms for propositional logic and for a number of Lisp functions and 
data types. Rules of inference include those for propositional calculus, equality, 
and instantiation. By the principle of definition, new function definitions (using 
defun) are admitted as axioms only if there exists an ordinal measure in which 
the arguments of each recursive call decrease. This ensures that no inconsisten- 
cies are introduced by new definitions. The theory has a constructive definition 
of the ordinals up to Sq, in terms of lists and natural numbers, given by the pred- 
icate eO-ordinalp and the order eO-ord-<. One important rule of inference is 
the principle of induction, which permits proofs by induction on Eq. 

In addition to the definition principle, the encapsulation mechanism (using 
encapsulate) allows the user to introduce new function symbols by axioms con- 
straining them to have certain properties (to ensure consistency, a witness local 
function having the same properties has to be exhibited). Inside an encapsulate, 
properties stated with defthm need to be proved for the local witnesses, and out- 
side, those theorems work as assumed axioms. The functions partially defined 
with encapsulate can be seen as second-order variables, representing functions 
with those properties. A derived rule of inference, functional instantiation, al- 
lows some kind of second-order reasoning: theorems about constrained functions 
can be instantiated with function symbols known to have the same properties. 

The ACL2 theorem prover is inspired by Nqthm, but has been considerably 
improved. The main proof techniques used by the prover are simplification and 
induction. Simplification is a combination of decision procedures, mainly term 
rewriting, using the rules previously proved by the user. The command defthm 
starts a proof attempt, and, if it succeeds, the theorem is stored as a rule. The 
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theorem prover is automatic in the sense that once defthm is invoked, the user 
can no longer interact with the system. However, in a deeper sense, the system is 
interactive. Very often, non-trivial proofs are not found by the system in the first 
attempt. The user has to guide the prover by adding lemmas and definitions, 
used in subsequent proofs as rules. The role of the user is important: a typical 
proof effort consists of formalizing the problem in the logic and helping the 
prover to find a preconceived proof by means of a suitable set of rewrite rules. 

1.2 Abstract Reductions and Term Rewriting Systems 

This section provides a short introduction to basic concepts and definitions from 
rewriting theory used in this paper. A complete description can be found in [1]. 

An abstract reduction is simply a binary relation ^ defined on a set A. We 
will denote as ^ and respectively the inverse relation, the symmetric 

closure, the reflexive-transitive closure and the equivalence closure. The following 
concepts are defined with respect to a reduction relation An element x is in 
normal form (or irreducible) if there is no 2 such that x ^ z. We say that x and 
y are joinable (denoted as a; J, y) if it exists u such that x ^ u ^ y. We say that 
X and y are equivalent ii x ^ y. 

An important property to study about reduction relations is the existence 
of unique normal forms for equivalent objects. A reduction relation has the 
Church-Rosser property if every two equivalent objects are joinable. An equiv- 
alent property is confluence: for all x, u, v such that u ^ x ^ v, then u [ v. 
If a reduction has the Church-Rosser property, then two distinct normal forms 
cannot be equivalent. If in addition the relation is normalizing (i.e. every element 
has a normal form, noted as a: J.) then x y x [= y [. Provided normal forms 
are computable and identity in A is decidable, then the equivalence relation 4^ 
is decidable in this case, using a test for equality of normal forms. 

Another important property is termination: a reduction relation is terminat- 
ing (or noetherian) if there is no infinite reduction sequence xq ^ x\ ^ X 2 ^ 
Obviously, every noetherian reduction is normalizing. The Church-Rosser prop- 
erty can be localized when the reduction is terminating. In that case an equiva- 
lent property is loeal eonfluence: for all a;, u, v such that u ^ x ^ v, then u [ v. 
This result is known as Newman’s lemma. 

One important type of reduction relations is defined in the set T{S,X) of 
first order terms in a given language, where V is a set of function symbols, and 
A is a set of variables. In this context, an equation is a pair of terms I = r. 
The reduction relation induced by a set of equations E is defined as follows: 
s t if there exist I = r G E and a substitution a of the variables in I 
(the matching substitution) such that a{l) is a subterm of s and t is obtained 
from s by replacing the subterm a{l) by cr(r). This reduction relation is of great 
interest in universal algebra because it can be proved that E\=s = tiSs-<^Et. 
This implies decidability of every equational theory defined by a set of axioms 
E such that -^e is terminating and locally confluent. To emphasize the use of 
the equation I = r from left to right as described above, we write I —>■ r and 
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talk about rewrite rules. A term rewriting system (TRS) is a set of rewrite rules. 
Unless denoted otherwise, E is always a set of equations (equational axioms) 
and i? is a term rewriting system. 

Local confluence is decidable for finite and terminating TRSs: joinability has 
only to be checked for a finite number of pair of terms, called critical pairs, 
accounting for the most general forms of local divergence. The critical pair 
theorem states that a TRS is locally confluent iff all its critical pairs are join- 
able. Thus, Church- Rosser property of terminating TRSs is a decidable property: 
it is enough to check if every critical pair has a common normal form. If a TRS 
R has a critical pair with different normal forms, there is still a chance to obtain 
a decision procedure for the equational theory of R, adjoining that equation as 
a new terminating rewrite rule. This is the basis for the well-known completion 
algorithms (see [1] for details). 

In the sequel, we describe the formalization of these properties in the ACL2 
logic and a proof of them using the theorem prover. For the rest of the paper, 
when we talk about “prove” we mean “mechanically prove using ACL2” . 

2 Formalizing Abstract Reductions in ACL2 

Our first attempt to represent abstract reduction relations in the ACL2 logic 
was simply to define them as binary boolean functions, using encapsulate to 
state their properties. Nevertheless, if a; — > y, more important than the relation 
between x and y is the fact that y is obtained from x by applying some kind 
of transformation or operator. In its most abstract formulation, we can view 
a reduction as a binary function that, given an element and an operator, re- 
turns another object, performing a one-step reduction. Consider, for example, 
equational reductions: elements in that case are first-order terms and operators 
are the objects constituted by a position (indicating the subterm replaced), an 
equation (the rule applied) and a substitution (the matching substitution). 

Of course not any operator can be applied to any element. Thus, a second 
component in this formalization is needed: a boolean binary function to test 
if it is legal to apply an operator to an element. Finally, a third component 
is introduced: since computation of normal forms requires searching for legal 
operators to apply, we will need a unary function such that when applied to 
an element, it returns a legal operator, whenever it exists, or nil otherwise (a 
reducihility test). 

The above considerations lead us to formalize the concept of abstract reduc- 
tions in ACL2, using three partially defined functions: reduce-one-step, legal 
and reducible. This can be done with the following encapsulate (dots are used 
to omit technical details, as in the rest of the paper): 

(encapsulate 

((legal (x u) t) (reduce-one-step (x u) t) (reducible (x) t)) 

(defthm legal-reducible-1 

(implies (reducible x) (legal x (reducible x)))) 
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(defthm legal-reducible-2 

(implies (not (reducible x)) (not (legal x op)))) 

. . .) 

The first line of every encapsulate is a signature description of the non- 
local functions partially defined. The two theorems assumed above as axioms 
are minimal requirements for every reduction we defined: if further properties 
(for example, local confluence, confluence or noetherianity) were assumed, they 
had to be stated inside the encapsulate. This is a very abstract framework 
to formalize reductions in ACL2. We think that these three functions capture 
the basic abstract features every reduction has. On the one hand, a procedural 
aspect: the computation of normal forms, applying operators until irreducible 
objects are obtained. On the other hand, a declarative aspect: every reduction 
relation describes its equivalence closure. Representing reductions in this way, 
we can define concepts like Church-Rosser property, local confluence or noethe- 
rianity and even prove non-trivial theorems like Newman’s lemma, as we will 
see. 

To instantiate this general framework, concrete instances of reduce-one- 
-step, legal and reducible have to be defined and the properties assumed 
here as axioms must be proved for those concrete definitions. By functional 
instantiation, results about abstract reductions can then be easily exported to 
concrete cases (as we will see for the equational case). 

2.1 Equivalence and Proofs 

Due to the constructive nature of the ACL2 logic, in order to define x y 
we have to include an argument with a sequence of steps x = xq ^ xi ^ 
X 2 ■ ■ ■ Xn = V- This is done by the function equiv-p defined in figure 1. 
(equiv-p x y p) is t if p is a proof justifying that X4ll>y. A proofs is a sequence 
of legal steps and each proof step is a structure r-step with four fields: eltl, 
elt2 (the elements connected), direct (the direction of the step) and operator. 
Two proofs justifying the same equivalence will be said to be equivalent. A proof 
step is legal (as defined by proof-step-p) if one of its elements is obtained 
applying the (legal) operator to the other (in the sense indicated by direct). 

Church-Rosser property and local confluence can be redefined with respect 
to the form of a proof (subsections 2.2 and 2.3). For that purpose, we de- 
fine (omitted here) functions to recognize proofs with particular shapes (valleys 
and local peaks): local-peak-p recognizes proofs of the form v ^ x u and 
steps-valley recognizes proofs of the form v ^ x u. 

2.2 Church-Rosser Property and Decidability 

We describe how we formalized and proved the fact that every Church-Rosser 
and normalizing reduction relation is decidable. Valley proofs can be used to 

Do not confuse with proofs done using the ACL2 system. 
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(def structure r-step direct operator eltl elt2) 

(defun proof-step-p (s) 

(let ((el (eltl s)) (e2 (elt2 s)) (op (operator s)) (dt (direct s))) 
(and 

(r-step-p s) 

(implies dt (and (legal el op) 

(equal (reduce-one-step el op) e2))) 

(implies (not dt) (and (legal e2 op) 

(equal (reduce-one-step e2 op) el)))))) 

(defun equiv-p (x y p) 

(if (endp p) (equal x y) 

(cuid (proof-step-p (car p) ) (equal x (eltl (car p))) 

(equiv-p (elt2 (car p) ) y (cdr p))))) 



Fig. 1. Definition of proofs and equivalence 



reformulate the definition of the Church-Rosser property: a reduction is Church- 
Rosser iff for every proof there exists an equivalent valley proof. Since the ACL2 
logic is quantifier-free, the existential quantifier in this statement has to be re- 
placed by a Skolem function, which we call transf orm-to-valley. The concept 
of being normalizing can also be reformulated in terms of proofs: a reduction is 
normalizing if for every element there exists a proof to an equivalent irreducible 
element. This proof is given by the (Skolem) function proof-irreducible (note 
that we are not assuming noetherianity yet). Properties defining a Church-Rosser 
and normalizing reduction are encapsulated as shown in figure 2, item (a). 

The function r-equiv tests if normal forms are equal. Note that the normal 
form of an element x is the last element of (proof-irreducible x): 

(defun normal-form (x) 

(last-of -proof x (proof-irreducible x))) 

(defun r-equiv (x y) 

(equal (normal-form x) (normal-form y))) 

To prove decidability of a Church-Rosser and normalizing relation, it is 
enough to prove that r-equiv is a complete and sound algorithm deciding the 
equivalence relation associated with the reduction relation. See figure 2, item 
(b). We also include the main lemma used, stating that there are no distinct 
equivalent irreducible elements. Note also that soundness is expressed in terms 
of a Skolem function make-proof-common-normal-form (definition omitted), 
which constructs a proof justifying the equivalence. These theorems are proved 
easily, without much guidance from the user. See the web page for details. 
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; ; ; (a) Definition of Church-Rosser amd normalizing reduction: 
(encapsulate 

((legal (x u) t) (reduce-one-step (x u) t) (reducible (x) t) 
(trEuisf orm-to-valley (x) t) (proof-irreducible (x) t)) 



(defthm Church-Rosser-property 

(let ((valley (trEuisf orm-to-valley p))) 

(implies (equiv-p x y p) 

(cuid (steps-valley valley) (equiv-p x y valley))))) 



(defthm normalizing 

(let* ( (p-x-y (proof-irreducible x) ) 

(y (last-of -proof x p-x-y))) 

(and (equiv-p x y p-x-y) (not (reducible y)))))) 

; ; ; (b) Main theorems proved: 

(defthm if-C-R — two-ireducible-connected-are-equal 
(implies 

(cUid (equiv-p x y p) (not (reducible x) ) (not (reducible y))) 
(equal x y))) 

(defthm r-equiv-sound 

(implies (r-equiv x y) (equiv-p x y (make-proof -common-n-f x y)))) 

(defthm r-equiv-complete 

(implies (equiv-p x y p) (r-equiv x y)) 



Fig. 2. Church-Rosser and normalizing implies decidability 



2.3 Noetherianity, Local Confluence, and Newman’s Lemma 

A relation is well founded on a set A if every non-empty subset has a minimal 
element. A restricted notion of well-foundedness is built into ACL2, based on 
the following meta-theorem: a relation on a set A is well-founded iff there exists 
a function F : A ^ Ord such that x < y ^ P{x) < F{y), where Ord is the class 
of all ordinals (axiom of choice needed). In ACL2, once a relation is proved to 
satisfy these requirements, it can be used in the admissibility test for recursive 
functions. A general well-founded partial order rel can be defined in ACL2 as 
shown in item (a) of figure 3. Since only ordinals up to Sq are formalized in the 
ACL2 logic, a limitation is imposed in the maximal order type of well-founded 
relations that can be represented. Consequently, our formalization suffers from 
the same restriction. Nevertheless, no particular properties of £q used in our 
proofs, except well-foundedness, so we think the same formal proofs could be 
carried out if higher ordinals were involved. 






Formalizing Rewriting in the ACL2 Theorem Prover 



99 



; ; ; (a) A well-founded partial order: 

(encapsulate 

((rel (x y) t) (fn (x) t)) 

(defthm rel-well-f ounded-relation 
(and (eO-ordinalp (fn x)) 

(implies (rel x y) (eO-ord-< (fn x) (fn y)))) 

: rule-classes : well-f ounded-relation) 

(defthm rel-transitive 

(implies (and (rel x y) (rel y z)) (rel x z)))) 

; ; ; (b) A noetheriani anid locally confluent reduction relation: 
(encapsulate 

((legal (x u) t) (reduce-one-step (x u) t) 

(reducible (x) t) (transform-local-peak (x) t)) 

(defthm locally-conf luent 

(let ((valley (tranisf orm-local-peak p))) 

(implies (cuid (equiv-p x y p) (local-peak-p p) ) 

(cuid (steps-valley valley) 

(equiv-p x y valley))))) 

(defthm noetherian 

(implies (legal x u) (rel (reduce-one-step x u) x)))) 

; ; ; (c) Definition of transform to valley: 

(defun transf orm-to-valley (p) 

(declare (xargs : measure (proof -measure p) 

: well-founded-relation mul-rel)) 

(if (not (exists-local-peak p)) 

P 

(tramsf orm-to-valley (replace-local-peak p)))) 

; ; ; (d) Main theorems proved: 

(defthm trEuisf orm-to-valley-admission 
(implies (exists-local-peak p) 

(mul-rel (proof -measure (replace-local-peak p) ) 
(proof -measure p)))) 

(defthm Newman-lemma 

(let ((valley (transf orm-to-valley p))) 

(implies (equiv-p x y p) 

(and (steps-valley valley) 

(equiv-p x y valley))))) 



Fig. 3. Newman’s lemma 
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In item (b) of figure 3, a general definition of a noetherian and locally conflu- 
ent reduction relation is presented. Local confluence is easily expressed in terms 
of the shape of proofs involved: a relation is locally confluent iff for every local 
peak proof there is an equivalent valley proof. This valley proof is given by the 
partially defined function transform-local-peak. As for noetherianity, our for- 
malization relies on the following meta-theorem: a reduction is noetherian if and 
only if it is contained in a well-founded partial ordering (AC) . Thus, the general 
well-founded relation rel previously defined is used to justify noetherianity of 
the general reduction relation defined: for every element x such that a legal 
operator u can be applied, then reduce-one-step obtains an element less than 
X with respect to rel. 

The standard proof of Newman’s lemma found in the literature (see [1]) shows 
confluence by noetherian induction based on the reduction relation. The proof 
we obtained in ACL2 differs from the standard one and it is based on the proof 
given in [8]. In our formalization, we have to show that the reduction relation 
has the Church- Rosser property by defining a function transf orm-to-valley 
and proving that for every proof p, (transf orm-to-valley p) is an equivalent 
valley proof. 

This function is defined to iteratively apply replace-local-peak (which 
replaces a local peak subproof by the equivalent proof given by transf orm-lo- 
cal-peak) until there are no local peaks. See definition in item (c) of figure 3. 

Induction used in the standard proof is hidden here by the termination proof 
of transf orm-to-valley, needed for admission. The main proof effort was to 
show that in each iteration, some measure on the proof, proof -measure, de- 
creases with respect to a well-founded relation, mul-rel. This can be seen as a 
normalization process acting on proofs. The measure proof -measure is the list 
of elements involved in the proof and the relation mul-rel is defined to be the 
multiset extension of rel. We needed to prove in ACL2 that the multiset exten- 
sion of a well-founded relation is also well-founded, a result interesting in its own 
right (see the web page for details). Once transf orm-to-valley is admitted, it 
is relatively easy to show that it always returns an equivalent valley proof. See 
item (d) of figure 3. 

Note that we gave a particular “implementation” of transf orm-to-valley 
and proved as theorems the properties assumed as axioms in the previous subsec- 
tion. The same was done with proof-irreducible. Decidability of noetherian 
and locally confluent reduction relations can now be easily deduced by functional 
instantiation from the general results proved in the previous subsection, allow- 
ing some kind of second-order reasoning. Name conflicts are avoided by using 
Common Lisp packages that are capable of removing them. 



3 Formalizing Rewriting in ACL2 

We defined in the previous section a very general formalization of reduction 
relations. The results proved can be reused for every instance of the general 
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framework. As a major example, we describe in this section how we formalized 
and reasoned about term rewriting in ACL2. 

Since rewriting is a reduction relation defined on the set of first order terms, 
we needed to use a library of definitions and theorems formalizing the lattice 
theoretic properties of first-order terms: in particular, matching and unification 
algorithms are defined and proved correct. See [12] for details of this work. Some 
functions of this library will be used in the following. Although definitions are 
not given here, their names suggest what they do. 

The very abstract concept of operator can be instantiated for term rewriting 
reductions. Equational operators are structures with three fields, containing the 
rewriting rule to apply, the position of the subterm to be replaced and the 
matching substitution: (def structure eq-operator rule pos matching). 

As we said in section 2, every reduction relation is given by concrete versions 
of legal, reduce-one-step and reducible. In the equational case: 

— (eq-legal term op R) tests if the rule of the operator op is in R, and 
can be applied to term at the position indicated by the operator (using the 
matching in op). 

~ (eq-reduce-one-step term op) replaces the subterm indicated by the po- 
sition of the operator op, by the corresponding instance (using matching) of 
the right-hand side of the rule of the operator. 

— (eq-reducible term R) returns a legal equational operator to apply, when- 
ever it exists, or nil otherwise. 

Note that for every fixed term rewriting system R a particular reduction re- 
lation is defined. The rewriting counterpart of the abstract equivalence equiv-p 
can be defined in an analogous way: (eq-equiv-p tl t2 p R) tests if p is a 
proof of the equivalence of tl and t2 in the equational theory of R. Due to the 
lack of space, we do not give the definitions here. Recall also from section 2 that 
two theorems (assumed as axioms in the general framework) have to be proved 
to state the relationship between eq-legal and eq-reducible. We proved them: 

(defthm eq-reducible-legal-1 
(implies (eq-reducible term R) 

(eq-legal term (eq-reducible term R) R) ) ) 

(defthm eq-reducible-legal-2 

(implies (not (eq-reducible term R) ) 

(not (eq-legal term op R) ) ) ) 

Formalizing term rewriting in this way, we proved a number of results about 
term rewriting systems. In the following subsections, two relevant examples are 
sketched. 
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(defthm eq-equiv-p-ref lexive (eq-equiv-p term term nil E) ) 

(defthm eq-equiv-p-syimnetric 

(implies (eq-equiv-p tl t2 p E) 

(eq-equiv-p t2 tl (inverse-proof p) E) ) 

(defthm eq-equiv-p-transitive 

(implies (cuid (eq-equiv-p tl t2 p E) (eq-equiv-p t2 t3 q E)) 
(eq-equiv-p tl t3 (proof-concat p q) E)) 

(defthm eq-equiv-p-stable 

(implies (eq-equiv-p tl t2 p E) 

(eq-equiv-p (instance tl sigma) (instance t2 sigma) 
(eq-proof-instcuice p sigma) E))) 

(defthm eq-equiv-p-compatible 

(implies (and (eq-equiv-p tl t2 p E) (positionp pos term)) 
(eq-equiv-p (replace-term term pos tl) 

(replace-term term pos t2) 

(eq-proof-context p term pos) E) ) 



Fig. 4. Congruence: an algebra of proofs 



3.1 Equational Theories and an Algebra of Proofs 

An equivalence relation on first-order terms is a congruence if it is stable (closed 
under instantiation) and compatible (closed under inclusion in contexts). Equa- 
tional consequence, E \= s = t, can be alternatively defined as the least congru- 
ence relation containing E. In order to justify that the above described represen- 
tation is appropriate, it would be suitable to prove that, for a given E, the relation 
established by (eq-equiv-p tl t2 p E), is the least congruence containing E 
(formally speaking, p has to be understood as existentially quantified). 

We proved it in ACL2. In figure 4 we sketch part of our formalization showing 
that eq-equiv-p is a congruence. The ACL2 proof obtained is a good example 
of the benefits gained by considering proofs as objects that can be transformed 
to obtain new proofs. Following Bachmair [2], we can define an “algebra” of 
proofs, a set of operations acting on proofs: proof-concat to concatenate proofs, 
inverse-proof to obtain the reverse proof, eq-proof-instance, to instantiate 
the elements involved in the proof and eq-proof-context to include the ele- 
ments of the proof as subterms of a common term. The empty proof nil can be 
seen as a proof constant. Each of these operations corresponds with one of the 
properties needed to show that eq-equiv-p is a congruence. The theorems are 
proved easily by ACL2, with minor help from the user. 
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; ; ; (a) A TRS with joinable critical pairs 
(encapsulate 

( (RLC 0 t) (transf orm-cp (11 rl pos 12 r2) t)) 

(def thm RLC- j oinable-critical -pairs 
(implies 

(and (member (cons 11 rl) (RLC)) (member (cons 12 r2) (RLC)) 

(positionp pos 11) (not (variable-p (occurence 11 pos)))) 
(let* ((cp-r (cp-r 11 rl pos 12 r2))) 

(implies cp-r 

(and (eq-equiv-p (Ihs cp-r) (rhs cp-r) 

(treuisf orm-cp 11 rl pos 12 r2) (RLC)) 
(steps-valley (transf orm-cp 11 rl pos 12 r2)) )))))) 

; ; ; (b) Theorem proved: 

(defun transf orm-eq-local-peak (p) ...) 

(defthm critical -pair-theorem 

(let ((valley (trauisf orm-eq-local-peak p))) 

(implies (cuid (eq-equiv-p tl t2 p (RLC)) (local-peak-p p) ) 

(cuid (steps-valley valley) 

(eq-equiv-p tl t2 valley (RLC)))))) 



Fig. 5. The critical pair theorem 



3.2 The Critical Pair Theorem 

The main result we have proved is the critical pair theorem: a rewrite system 
R is locally confluent iff every critical pair obtained with rules in R is joinable. 
This result is formalized in our framework and proved guiding the system to the 
classical proof given in the literature (see [1] for example). 

In item (a) of figure 5, a term rewriting system (RLC) is partially defined 
assuming the property of joinability of its critical pairs. The partially defined 
function (transf orm-cp 11 rl pos 12 r2) is assumed to obtain a valley proof 
for the critical pair determined by the rules (11 . rl) and (12 . r2) at the 
non-variable position pos of 11. The function (cp-r 11 rl pos 12 r2) com- 
putes such a critical pair, whenever it exists (after prior renaming of the variables 
of the rules, in order to get them standardized apart). 

To prove the critical pair theorem in our formalization, we have to define a 
function transf orm-eq-local-peak and prove that it transforms every equa- 
tional local peak proof to an equivalent valley proof. The final theorem is shown 
in item (b) of figure 5. The ACL2 proof of this theorem is the largest proof we 
developed. Due to the lack of space, we cannot describe here the proof effort. 
We urge the interested reader to see the web page. 
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This theorem and the theorems described in Section 2 for abstract reduction 
relations were used to prove that equational theories described by a terminating 
TRS such that every critical pair has a common normal form are decidable. This 
result (which some authors call the Knuth-Bendix theorem) is easily obtained by 
functional instantiation from the abstract case, taking advantage of the fact that 
the whole formalization is done in the same framework. Note how this last result 
can be used to “certify” decision procedures for equational theories defined by 
confluent and terminating TRSs. 



4 Conclusions and Further Work 

We have presented a case study of using the ACL2 system as a metalanguage to 
formalize properties of object proof systems (abstract reductions and equational 
logic) in it. It should be stressed that the task of proving in ACL2 is not trivial. 
As claimed in [6] , difficulties come from “the complexity of the whole enterprise of 
formal proofs”, rather than from the complexity of ACL2. A typical proof effort 
consists of formalizing the problem, and guiding the prover to a preconceived 
“hand proof” , by decomposing the proof into intermediate lemmas. Most of our 
lemmas are proved mainly by simplification and induction, without hints from 
the user. If one lemma is not proved in a first attempt, then additional lemmas 
are often needed, as suggested by inspecting the failed proof (for example, the 
proof of the critical pair theorem needed more than one hundred lemmas and 
fifty auxiliary definitions). Nevertheless, proofs can be simpler if a good library 
of previous results {books in the ACL2 terminology) is used. We think our work 
provides a good collection of books to be reused in further verification efforts. 
Our formalization has the following main features: 

— Reduction relations and their properties are stated in a very general frame- 
work, as explained in section 2. 

~ The concept of proof is a key notion in our formalization. Proofs are treated 
as objects that can be transformed to obtain new proofs. 

— Functional instantiation is extensively used as a way of exporting results 
from the abstract case to the concrete case of term rewriting systems. 

Some related work has been done in the formalization of abstract reduction 
relations in other theorem proving systems, mostly as part of formalizations 
on the A-calculus. For example, Huet [5] in the Coq system or Nipkow [11] in 
Isabelle/HOL. A comparison is difficult because our goal was different and, more 
important, the logics involved are significantly different: ACL2 logic is a much 
weaker logic than those of Coq or HOL. A more related work is Shankar [13], 
using Nqthm. Although his work is on the concrete reduction relation from the 
A-calculus and he does not deal with the abstract case, some of his ideas are 
reflected in our work. 

To our knowledge, no formalization of term rewriting systems has been done 
yet and consequently the formal proofs of their properties presented here are the 
first ones we know that have been performed using a theorem prover. 
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We think the results presented here are important for two reasons. From a 
theoretical point of view, it is shown that a very weak logic can be used to for- 
malize properties of TRSs. From a practical point of view, this is an example of 
how formal methods can help in the design of symbolic computation systems. 
Usually, algebraic techniques are applied to the design of proof procedures in 
automated deduction. We show how benefits can be obtained in the reverse di- 
rection: automated deduction used as a tool to “certify” components of symbolic 
computation systems. Since ACL2 is also a programming language, this paper 
shows how computing and proving tasks can be mixed. Although a fully verified 
computer algebra system is currently beyond our possibilities, the guard verifi- 
cation mechanism [6] can be used to obtain verified Lisp code (executable in any 
compliant Common Lisp) for some basic procedures of term rewriting systems. 
There are also several ways in which this work can be extended. For example: 

— In order to obtain certified decision procedures for some equational theories 
(or for the word problem of some finitely presented algebras) work has to be 
done to formalize in ACL2 well-known terminating term orderings (recursive 
path orderings, Knuth-Bendix orderings, etc.). Maybe some problems will 
arise due to the restricted notion of noetherianity supported by ACL2. 

— The work presented in [10] suggests another application of this work: other 
theorem provers can be combined with ACL2 in order to obtain mechanically 
verified decision algorithms for some equational theories. 

— Our goal in the long term is to obtain a certified completion procedure 
written in Common Lisp. Although for the moment this may be far from 
the current status of our development, we think the work presented here is 
a good starting point. 
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Abstract. This paper is a brief continuation of earlier work by the same 
authors [4] and [5] that deals with the concepts of conjecture, hypothesis 
and eonsequence in orthocomplemented complete lattices. It considers 
only the following three points: 

1. Classical logic theorems of both deduction and contradiction are rein- 
terpreted and proved by means of one specific operator Ca defined 
in [4]. 

2. Having shown that there is reason to consider the set C/\{P) of 
consequences of a set of premises P as too large, it is proven that 
Ca{P) is the largest set of consequences that can be assigned to P 
by means of a Tarski’s consequences operator, provided that L is a 
Boolean algebra. 

3. On the other hand, it is proven that, also in a Boolean algebra, the 
set ^a(P) of strict conjectures is the smallest of any ^(P) such that 
P C <P{P) and that if P C Q then ‘P(Q) C <P{P). 

Keywords Conjectures, Hypotheses, Consequences, Boolean Algebras, Deduc- 
tion, Contradiction. 

1 Introduction 

Let L be an orthocomplemented complete lattice with operations • (meet), -I- 
(join), ' (complement), minimum 0 and maximum 1. We will consider subsets 
(of premises) P Q L such that AP yf 0^ and designate the respective family 
{P G V{L);p/\ yf 0} as Vo{L). It is obvious that there are no contradictory pairs 

* Paper partially supported by Spanish Ministry of Education and Culture under 
projects PB98-1379-C02-C02 and CICYT-TIC99-1151 
^ AP = Inf{P) = Pa and VP = Sup{P) = pv 
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Pi , P 2 in P, that is pairs pi , P 2 such that pi < P 2 • Pursuant to [4] , we will deal with 
the operators : Vq{L) V{L), : Vq{L) V{L), : Vq{L) V{L) and 

Ca : Po(P) — *■ 'Po(L) defined respectively, as: 

<Pv{P) = {q & L — {0};pv < q'Y (loose conjectures of P) 

^a(P) = {q & L — {0};pa < q'Y (strict conjectures of P) 

Ha{P) = {q € L — {0}; q < p^} (hypotheses of P) 

Ca{P) = {q G L — {0};pa < w} (consequences of P) 

for any P € Vo{L) and where c is the complement of subsets in P{L). 

As it was proven in [4], P C Ca{P) C <?a(P) C <?v(P), Ha{P) Q ^a(P) ^ 
<l>v(P) and Ca(P) n Ha{P) = {pa}- H P ^ Q, ^v{P) C ^a{Q) C 

<Pa{P), Ha{Q) C Ha{P) and Ca{P) Q Ca{Q)', that is, loose conjectures and 
consequences are monotonic, but strict conjectures and hypotheses are anti- 
monotonic. 

For each P G Vo{L), the set P' = {p';p G P} verifies Inf{P') = {Sup{P))' Y 
0 if and only if Sup{P) Y 1- Let Poi(L) = {P G Vo{L); Sup{P) Y 1} and 
suppose we designate the restrictions on Poi of ^v, ^a^ Ha and Ca using the 
same symbols. Obviously, Ha(P) = {q G L — {0}; Sup(P') = {Inf{P))' < q'}. 

Remark. It is quite clear that Ca{P) is always a filter that, generally, is not 
prime, and that PIa{P) U {0} = {q G L \ q < pa} is always an ideal that, 
generally, is not prime either. Neither <Py{P) nor <Pa{P) are either filters or 
ideals. 

2 The Classical Theorems of Deduction and 
Contradiction Reconsidered 

In this section, the classical theorems of deduction and contradiction are reviewed 
and it is found that only one part of each one depends on both distributivity 
and a specific operation of implication. 



2.1 Theorem of Deduction 

Let L be an orthocomplemented complete lattice, P G Vo{L), and a operation 
^ of implication in L. So, ^ is such that a-{a ^ b) <b for all a, 6 G L. If L is a 
Boolean algebra, this property is equivalent to a ^ b < a' + b, and the material 
implication a ^ b = a' + bis the greatest. 

Under such conditions, we will split this theorem into two parts that require 
different additional conditions: 

Lemma 1 Va, b G L : a ^ b G Ca{P) =A 5 G Ca(P U {a}), 

which stands for “b is a consequence of P extended with a if a ^ 6 is a conse- 
quence of P” . 
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Proof: Provided that a ^ b G C/\{P) then: 

Prx < a ^ b (by definition), 
a ■ Pf\ < a • {a ^ b) (monotonicity of •), 
a • {a ^ b) <b (definition of ^), 
a • Pa ^ b (transitivity of <). 

As A(P U {a}) = a ■ Pa, it follows that A(P U {a}) < b, which is the 
definition for b G Ca(-P U {a}) 



Lemma 2 If L is a distributive lattice and the operator is defined as a ^ 
b = a' + b (material implication) , then: 

Va, b G L : b G Ca{P 'J {a}) a ^ b G C'a(P), 

which stands for “a ^ bis a consequence of P if 6 is a consequence of P extended 
with a” . 

Proof: Provided that 5 G Ca(P U {a}), then: 

a • Pa ^ b (by definition), 

a' + {a • Pa) < a' + b (monotonicity of +), 

(o' + a) ■ {a' + Pa) < a' + b (distributivity), 
a' +Pa < a' + b (since a' + a = 1), 

Pa ^ a,' + b (transitivity and Pa < a' + p a), 

Pa ^ CL ^ b (by definition of — >), 
a ^ b G Ca{P) (by definition). 

Remark. Lemma 2 requires the implication a ^ b = a' + b. Section 2.1 shows 
that it cannot be generalized to all implications. 

Theorem 1 Let L he a complete Boolean algebra and a ^ b = a' + b, then: 

Va, b G L : a ^ b G Ca{P) b G Ca(P U {a}), 
which is a generalization of the the first-order logic theorem of deduction. 

Proof: 

Immediate, as a complete Boolean algebra is nothing other than a dis- 
tributive orthocomplemented and complete lattice. 



Lemma 2 Is Not Valid for All Implications. Let L be the Boolean algebra 
shown in Figure 1 and consider the following implications: 



b = 



1 if a < b 
0 otherwise 



b = 



b 



Let P = {a}, so Ca{P) = {a, 6',c',l}. Now, let’s consider C'a(PU {6'}) = 
{a,b',c', 1}. If lemma 2 were true, “a G C'a(PU{6'}) b' a G C'a(P)”, but 
a = 0^CA(P). 

Similarly, if P = {a'}, Ca(P) = {o', 1} and C'a(PU {c'}) = {6, a',c',l} but 
b G Ca{P U {c'}) and c' ^2 6 = c' • 6 = 6 ^ C'a(P). 
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Fig. 1. The 2^ Boolean Algebra 



2.2 Theorem of Contradiction 

Let L be an orthocomplemented complete lattice and P G Vo{L). 

Lemma 3 \/a G L : a G C/y{P) a' ■ p/^ = 0, 

which stands for “o’ is incompatible with P if a is a consequence of F” . 

Proof: if o e C'a(P), as it means p/\ < a, it follows that a'-p/\ < a' -a = 0, 
and a' • p/\ = 0. 



Lemma 4 If L is a distributive lattice, then: 

Vo G L : o' • Pa = 0 a G C'a(-P), 

which stands for “o is a consequence of P if a’ is incompatible with P’ . 

Proof: If a' ■ p/\ = 0, it follows: 
a + (o' • Pa) = 0 + 0 (monotonicity of +), 

(a + o') • (o + Pa) = a (distributivity), 
a + Pa = o (since a + o' = 1), 

Pa < a (transitivity, and Pa < o +Pa). Hence a G Ca{P)- 



Theorem 2 If L is a complete Boolean algebra: 

G L : a G Ca (P) o' • Pa = 0, 

which is a generalization of the first-order logic theorem of contradiction. 



Proof: Immediate, since a complete Boolean algebra is a distributive 
orthocomplemented complete lattice. 




Conjectures, Hypotheses, and Consequences in Orthocomplemented Lattices 111 



3 Another Look at H/^ and Ca 



3.1 The Dualized Operator 

For each operator F : T^oi(^) ^ 'PiL) let the dualized operator be SF, defined by 
SF = CO Fo', where ' : Voi{L) Voi is given by '(P) = P' , and c : Poi ^ P{L) 
is c{P) = P‘^ = {q G L;q ^ P}. If P G Poi(^)) SF{P) = F{P'Y and, obviously: 

— S{SF) = S{c o Fo') = c o (c o Fo')o' = (c o c) o P o ('o') = F 

— Defining P < G by P(P) C G(P) for all P G Poi(^)) F < G implies 
P(P') C G(P') and G{P'Y C F{P'Y, or 5G < SF. 

— The selfdualized operators or operators P such that SF = F verify c o P = 
Po', and these operators are neither strictly monotonic nor strictly anti- 
monotonic. For example, if P is strictly anti-monotonic and selfdualized, it 
follows from P C Q that F{Q) C F{P) and then F^P)" C F{Q)''. This is 
equivalent to F{P') C F{Q'), against P' C Q' . 

Hence, neither ^v, Hf\ nor Ga are, generally, selfdualized operators. Let’s 
look at what they change into. 



3.2 Are i?A and Ga Too Restrictive? 

S^y{P) = ^y{py = ({g G P - {0}; Sup{P') < q'Y)'' = {qGL- {0}; SupP' < 
q'} = {q G L - {0}; q < {SupP')'} = {q G L - {0}; q < InfP] = H^P)- So 
S<l>^/ = Pa and SH/\ = <?v 

Similarly, S<1^^{P) = <P^{P')'' = {q G L — {0};/n/P' < <?'} = {(? G P — 
{0}; q < SupP}. This set, Hy{P) = {qG L — {0}; a < SupP}, which contains P 
and H^{P), verifies H\/{P) n G^{P) = {q G L — {0};/n/(P) < q < Sup{P)} = 
G{P). The set of restricted consequences of P (see [4] and [5]) was not considered 
in [4] as a set of hypotheses of P, because it is uncommon to accept the premises 
and consequences of P other than /n/(P) as hypotheses of P. 

However, the operator Pv now appears as the dualized operator of like 
Pa of The set G/\{P) could be too large or the set H^P) could be too 
small. Are there operators G and P such that 

1. G(P) C Ga(P), 

2. Pa(P) C P(P), and 

3. G(P) n P(P) = {Inf{P)} ? 

Neither Tarski’s consequences operator G of restricted consequences answers 
this question, nor will this paper solve the problem. The only new thing related 
to (1) is that Ga is the largest of the Tarski’s consequences operators in Vq{L), 
where P is a Boolean algebra. 
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4 Some New Properties of Operators Ca and ^a 

4.1 Ca Is the Largest Tarski’s Operator 

Lemma 5 Let P G ViL) and x G L, where L is a Boolean algebra. Then: 
x^C^{P)^P\J{x'}gVo{L) 

Proof: 

If PU{x'} ^ Vo{L), then p^-x' = 0 and, hence, (pa -x')+x = x. As L is 
distributive, (pa + x) ■ {x' + a;) = a;, thus (pa + a:) • 1 = a: and Pa + a; = a;. 
Then Pa < a; and, therefore, x G C/.,{P). 

Theorem 3 Let L he a Boolean algebra. For every function C : Vo{L) Vo{L) 
for which the following properties hold: 

— PC C{P) (expansion) , and 

— P C Q ^ C{P) C C{Q) (monotonicity) , 

c(p)cCa(p), yPGVo{L). 

Proof: 

Suppose that there is P G Po{L) such that C{P) % C/\{P). Then there 
exists x G C{P) such that x ^ C/\{P). Because of lemma 5, we have 
P U {a:'} G Vo{L) and, therefore, it is in the domain of C. However, 
x G C(PU{a;'}) by monotonicity and x' G C(P U {a;'}) by expansion. 
Hence, AC{P U {a;'}) = 0 and C{P U {a;'}) ^ 'Pq{L), which is contradic- 
tory. Therefore, C{P) C C^{P). 

Corollary 1 Let L he a Boolean algebra. Any Tarski’s consequences operator 
C : Vo{L) — > Vo{L) verifies: 

C(P)CCa(P), yPGPoiL). 



Remarks: 

1. Distributivity is necessary as the following example shows. Given the typical 
non distributive hexagonal lattice L shown in Figure 2, define the following 
consequences operator: C({1}) = {1}, C({a}) = C({6}) = C{{a,b}) = 
C({a,l}) = C{{b,l}) = C{{a,b,l}) = {a, 6,1} and C({a'|) = C{{b'}) = 
C{{a',b'}) = C({a',l|) = P({6',1|) = F({a',6',l|) = {a',6',1}. It verifies 
the properties of expansion and monotonicity. However we have 

cm) % cm}) and C({a'|) ^ CA({a'|) 

2. Note the importance of C being defined as a mapping Vo{L) Po{L). 
which is defined between only Vo{L) and V{L), verifies P C T>y[P) and 
PCQ^^y{P)C <Py{Q). However, Ca(P) C <Ty{P). 

3. If the lattice is distributive but not orthocomplemented, theorem 3 is no 
longer valid, as there can be consequences operators that are greater, lesser 
and even incomparable with Ca (see [2]). 
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Fig. 2. Hexagonal Lattice 



4.2 Is the Smallest Expansive and Anti-monotonic Operator 

Lemma 6 Let P G Vo{L) and x G L, where L is a Boolean algebra. Then: 
xG^^{P)^P\J{x}gVo{L) 



Proof: 

Suppose that PU {a;} ^ Vo{L). Then p^- x = 0 and, therefore, (pa • x) + 
x' = x' . Because L is distributive, (pa + x') ■ {x + x') = x' and, thus, 
(Pa +a;') • 1 = x' and p^^+x' = x' . Then, pa < x' and, hence, x ^ ^a(-P)- 



Theorem 4 Let L he a Boolean algebra. For any function <P : Vo{L) V{L) 
such that: 

— P C <L{P) (expansion), and 

— P (= Q => ^{Q) C d>{P) (anti-monotonicity) , 



<Z>a(P) C4)(P), \fPGVo{L). 



Proof: 

Let x G <P/\{P). Because of lemma 6, P U {a;} G Vo{L) and, therefore, 
it is in the domain of d>. Thus, by expansion and anti-monotonicity, we 
have 

X G P\J {x} C ^(P U {a;}) C ^(P). 



Hence, ^^a(P) C d>{P) 
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Remarks: 



1. Distributivity is necessary as the following example shows. Given the non 
distributive lattice L shown in Figure 2, define the following operator: 



•^({a}) = {a,b, 1} 
<?({6}) = {a,6,l} 
*^({1}) = {a,b, l,a',b'} 
^{{a,b}) = {a,b, 1} 

^({a. 1}) = 

<P{{b,l}) = {a,b,l} 
^{{a,b, 1}) = {a, b, 1} 



<l>m) = W,b',l} 

<!>{{a'}) = {a',b',l} 

<P{{a',b'}) = {a',b',l} 
<!>{{b',l}) = {a',b',l} 
<!>{{a',l}) = {a',b',l} 
<!>{{a',b',l}) = {a',b',l} 



It verifies the properties of expansion and anti-monotonicity. However, 

2. Note the importance of the property P C <P{P). Pl/\ is anti-monotonic, but 
H/\ Q ^A) because P % H/^{P). 
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Abstract. There are many problems with the simplification of elemen- 
tary functions, particularly over the complex plane. Systems tend to 
make major errors, or not to simplify enough. In this paper we outline 
the “unwinding number” approach to such problems, and show how it 
can be used to prevent errors and to systematise such simplihcation, even 
though we have not yet reduced the simplihcation process to a complete 
algorithm. The unsolved problems are probably more amenable to the 
techniques of artihcial intelligence and theorem proving than the original 
problem of complex-variable analysis. 

Keywords: Elementary functions; Branch cuts; Complex identities. 

Topics: AI and Symbolic Mathematical Computing; Integration of Log- 
ical Reasoning and Computer Algebra. 

1 Introduction 

The elementary functions are traditionally thought of as log, exp and the trigono- 
metric and hyperbolic functions (and their inverses). This list should include 
powering (to non-integral powers) and also the n-th root. These functions are 
built in, to a greater or lesser extent, to many computer algebra systems (not 
to mention other programming languages [8,12]), and are heavily used. How- 
ever, reasoning with them is more difficult than is usually acknowledged, and all 
algebra systems have one, sometimes both, of the following defects: 

— they make mistakes, be it the traditional schoolchild one 

1 = VT= y/FTF=-l (1) 

or more subtle ones (see footnote 6); 

* The authors are grateful to Mrs. A. Davenport for her help with the original of [3], 
and to Dr. D. E. G. Hare of Waterloo Maple for many discussions. 

** This work was performed while this author held the Ontario Research Chair in Com- 
puter Algebra at the University of Western Ontario. Background work was supported 
by the European Commission under Esprit project OpenMath (24.969). 
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— they fail to perform obvious simplifications, leaving the user with an impos- 
sible mess when there “ought” to be a simpler answer. In fact, there are two 
possibilities here: maybe there is a simpler equivalent that the system has 
failed to find, but maybe there isn’t, and the simplification that the user 
wants is not actually valid, or is only valid outside an exceptional set. In 
general, the user is not informed what the simplification might have been, 
nor what the exceptional set is. 

Faced with these problems, the user of the algebra system is not convinced that 
the result is correct, or that the algebra system in use understands the func- 
tions with which it is reasoning. An ideal algebra system would never generate 
incorrect results, and would simplify the results as much as practicable, even 
though perfect simplification is impossible, and not even totally well-defined: is 

1 -I- a; -I- • • • -I- “simpler” than — l)/(a; — 1)? 

Throughout this paper, 2 and its decorations indicate a complex variable, 
while X, y and t indicate real variables. The symbol A denotes the imaginary 
part, and 3? the real part, of a complex number. For the purposes of this paper, 
the precise definitions of the inverse elementary functions in terms of log are 
those of [4]: these are reproduced in Appendix A for ease of reference. 

2 The Problem 

The fundamental problem is that log is multi-valued: since exp(27rf) = 1, its 
inverse is only valid up to adding any multiple of 27ri. This ambiguity is tra- 
ditionally resolved by making a branch cut: usually [1, p. 67] the branch cut 
(—00,0], and the rule (4.1.2) that 

— 7T < AlogZ < 7T. (2) 

This then completely specifies the behaviour of log: on the branch cut it is 
continuous with the positive imaginary side of the cut, i.e. counter-clockwise 
continuous in the sense of [10]. 

What are the consequences of this definition^? From the existence of branch 
cuts, we get the problem of a lack of continuity: 

lim log(a; -I- iy) ^ log x : (3) 

y^o- 

for a; < 0 the limit is log a: — 2Tri. Related to this is the fact that 

log Z yf log z (4) 

^ Which we do not contest: it seems that few people today would support the rule one of 
us (JHD) was taught, viz. that 0 < A log z < 27 t. The placement of the branch cut is 
“merely” a notational convention, but an important one. If we wanted a function that 
behaves like log but with this cut, we could consider log (z) = log(— 1) — log(— 1/z) 

[0,27t) 

instead. We note that, until 1925, astronomers placed the branch cut between one 
day and the next at noon [7, vol. 15 p. 417]. 
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on the branch cut: instead logz = logz + 27ri on the cut. Similarly, 



log 




7^ -log z 



( 5 ) 



on the branch cut: instead log(l/ 2 ) = —log 2 + 2tti on the cut. 

Although not normally explained this way, the problem with (1) is a conse- 
quence of the multi-valued nature of log: if we define (as for the purposes of this 
paper we do) 



^fz = exp 




( 6 ) 



then — 7 t/2 < 9-^2 < tt/ 2. On the real line, this leads to the traditional resolution 
of (1), namely that = |a;|. 

Three families of solutions have been proposed to these problems. 



— Prof. W. Kahan points out that the concept of a “signed zero”^ [9] (for 
clarity, we write the positive zero as O’*" and the negative one as 0“) can be 
used to solve the above problems, if we say that, for a; < 0, log(a: -I- 0+f) = 
log a; -I- Tii whereas log(a; -I- 0“z) = log a; — ttz. Equation (3) then becomes an 
equality for all x, interpreting the x on the right as a; -I- Q~i. Similarly, (4) 
and (5) become equalities throughout. Attractive though this proposal is, 
it does not answer the fundamental question as far as the designer of a 
computer algebra system is concerned: what to do if the user types log(— 1). 

— The authors of [5] point out that most “equalities” do not hold for the 
complex logarithm, e.g. log(z^) yf 21ogz (try z = —1), and its generalisation 



log(ziZ 2 ) ^ logZl -blogZ 2 - 



( 7 ) 



The most fundamental of all non-equalities is 2 = log exp z, whose most ob- 
vious violation is at z = 27rf. (A similar point was made in [2], where the 
correction term is called the “adjustment”.) They therefore propose to for- 
malise the violation of this equality by introducing the unwinding number 
1C, defined^ by 



/C(z) 



z — log 
27ri 



CSZ — TT 
2tt 



e z 



(8) 



(note that the apparently equivalent definition differs precisely on 

the branch cut for log as applied to expz). 



^ One could ask why zero should be special and have two values. The answer seems to 
be that all the branch cuts we need to consider are on either the real or imaginary 
axes, so the side to which the branch cut adheres depends on the sign of the imaginary 
or real part, including the sign of zero. To handle other points similarly would require 
the arithmetic of non-standard analysis. 

® Note that the sign convention here is the opposite to that of [5], which defined fC{z) 
as : the authors of [5] recanted later to keep the number of —Is occurring in 

formulae to a minimum. We could also change “unwinding” to “winding” when we 
make that sign change; but “winding number” is in wide use for other contexts, and 
it seems best to keep the existing terminology. 
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This definition has several attractive features: JC{z) is integer-valued, and 
familiar in the sense that “everyone knows” that the multivalued logarithm 
can be written as the principal branch “plus 2nik for some integer /c”; it 
is single- valued; and it can be computed by a formula not involving loga- 
rithms. It does have a numerical difficulty, namely that you must decide if 
the imaginary part is an odd integer multiple of tt or not, and this can be 
hard (or impossible in some exact arithmetic contexts), but the difficulty is 
inherent in the problem and cannot be repaired e.g. by putting the branch 
cuts elsewhere. 

Some correct identities for elementary functions using /C are given in Table 1. 



1. z = loge^ -I- 2'KiK.{z) . 

2. K.{a\ogz) = OV 2 € C if and only if — 1 < a < 1 . 

3. log Zi -I- log 22 = log(2i22) -I- 27ri/C(log zi -\- log 22 ) . 

4. a log 2 = log 2 “ -I- 2mJC{a log 2 ) . 

^ ^a6 _ ^^a'jb^2TTibJC(alogz) 

Table 1. Some correct identities for logarithms and powers using JC. 



(7) can then be rescued as 

log(2iZ2) = log2i -b log 22 - 27rf/C(log2i -b log22). (9) 

Similarly (4) can be rescued as 

log 2 = log 2 — 27rf/C (log 2) . (10) 

Note that, as part of the algebra of /C, /C(log2) = /C(-logz) yf /C(logl/z). 
/C(z) depends only on the imaginary part of 2. 

— Although not formally proposed in the same way in the computational com- 
munity, one possible solution, often found in texts in complex analysis, is 
to accept the multi-valued nature of these functions (we adopt the com- 
mon convention of using capital letters, e.g. Ln, to denote the multi-valued 
function), defining, for example 

Arcsin2 = {y \ siny = z}. 

This leads to = {±2}, which has the advantage that it is valid through- 
out C. Equation 7 is then rewritten as 

Ln(2i22) = Ln 2i -b Ln 22 , (11) 

where addition is addition of sets (A + B = {a + b : a € A, b € i?}) and 
equality is set equality^. 

^ “The equation merely states that the sum of one of the (infinitely many) logarithms 
of 2 i and one of the (infinitely many) logarithms of 22 can be found among the 
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However, it seems to lead in practice to very large and confusing formulae. 
More fundamentally, this approach does not say what will happen when the 
multi-valued functions are replaced by the single-valued ones of numerical 
programming languages. 

A further problem that has not been stressed in the past is that this approach 
suffers from the same aliasing problem that naive interval arithmetic does [6] . 
For example, 

Ln(z^) = Ln z -I- Ln z yf 2 Ln z , 

since 2Ln(z) = {21og(z) -I- Akiri : k G Z}, but Ln(z) -I- Ln(z) = {21og(z) -I- 
2fc7Tz : k G Z}: indeed if z = —1, log(z^) ^ 2Ln(z). Hence this method is 
unduly pessimistic: it may fail to prove some identities that are true. 



3 The Role of the Unwinding Number 

We claim that the unwinding number provides a convenient formalism for rea- 
soning about these problems. Inserting the unwinding number systematically al- 
lows one to make “simplifying” transformations that are mathematically valid. 
The unwinding number can be evaluated at any point, either symbolically or 
via guaranteed arithmetic: since we know it is an integer, in practice little ac- 
curacy is necessary. Conversely, removing unwinding numbers lets us genuinely 
“simplify” a result. We describe insertion and removal as separate steps, but 
in practice every unwinding number, once inserted by a “simplification” rule, 
should be eliminated as soon as possible. We have thus defined a concrete goal 
for mathematically valid simplification.® 

The following section gives examples of reasoning with unwinding numbers. 
Having motivated the use of unwinding numbers, the subsequent sections deal 
with their insertion (to preserve correctness) and their elimination (to simplify 
results). 

4 Examples of Unwinding Nnmbers 

This section gives certain examples of the use of unwinding numbers. We should 
emphasise our view that an ideal computer algebra system should do this manip- 
ulation for the user: certainly inserting the unwinding numbers where necessary, 
and preferably also removing/simplifying them where it can. 

4.1 Forms of arccos 

The following example is taken from [4] , showing that two alternative definitions 
of arccos are in fact equal: 

(infinitely many) logarithms of ziZ 2 , and conversely every logarithm of ziZ 2 can be 
represented as a sum of this kind (with a suitable choice of [elements of] Ln zi and 
Lnz 2 ).” [3, pp. 259-260] (our notation). 

® Just to remove the terms with unwinding numbers, as is done in some software 
systems, could be called “over-simplification.” 
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Theorem 1. 

^log (z + i\/l - 2 : 2 ) . (12) 

First we prove the correct (and therefore containing unwinding numbers) version 
? 

of ^ZYZi = ^Jz{^Jzi. 

Lemma 1. 

( 13 ) 

Proof. 

yJz.\Zi = exp Q (log( 2 iZ 2 ))^ 

= exp (i (log zx + log Z2 - 27Tz/C(log zx + log 22 ))^ 

= exp (-7Tz/C(log Zx + log Z 2 )) 



Lemma 2. Whatever the value of z, 

\/l — 2\/l + Z = \/l — 

This is a classic example of a result that is “obvious”: the schoolchild just squares 
both sides, but in fact that loses information, and the identity requires proof. 
To show this, consider the apparently similar “result”®: 

\/—i — zV—i + z=\/— 1 — 

If we take z = i/2, the left-hand side becomes a/— 3z/2 y^—i/2: the inputs to the 
square roots^ have arg = — 7t/2, so the square roots themselves have arg = — tt/4, 
and the product has arg = — tt/2, and therefore is — z-\/3/2. The right-hand side 
is a/-3/4 = i^/3/2. 

Proof. It is sufficient to show that the unwinding number term in lemma 1 is 
zero. Whatever the value of z, 1 -I- z and 1 — z have imaginary parts of opposite 
signs. Without loss of generality, assume 9z > 0. Then 0 < arg(l + z) < tt 
and — TT < arg(l — z) < 0. Therefore their sum, which is the imaginary part of 
log(l -|- z) -|-log(l — z), is in (— 7r,7r]. Hence the unwinding number is indeed zero. 

® Maple V.5, in the absence of an explicit declaration that z is complex, will say that 
the two are almost never equal, with the difference being —2i\/l — z^, but in fact at 
z = 2z, the two are equal. 

^ One is tempted to say “arguments of the square root”, but this is easily confused 
with the function arg; we use ‘inputs’ instead. 
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Proof of Theorem 1. Now 




by the previous lemma. Also 2 log a = log(a^) if /C(21oga) = 0, so we need only 
show this last stipulation, i.e. that 




This is trivially true at 2 = 0. If it is false at any point, say zq, then a path from 
zq to 0 must pass through a z where arg + z)/2 + — z)/2^ = tt/2, 

i.e. y/(l + z) /2 + iyj(l — z)/2 = it for t G R, because, first, arg is continuous 
for \z\ < 7t/ 2, and indeed for \z\ < tt, and, second, that the inputs to arg are 
themselves discontinuous only on z > 1 and z < — 1, and on these half- lines, the 
arguments in question are 0 and 7 t/ 2, which are acceptable. Coming back to the 
continuity along the path, we find that by squaring both sides, z -I- — ^ = 

—t^, i.e. {z+t^Y = — (1 — z^). Hence 2zt^+t'^ = —1, so z = —(1 -I- t"‘)/(2t^) < — 1, 
and in particular is real. On this half-line, as stated before, the argument in 
question is -l-7r/2, which is acceptable. Hence the argument never leaves the 
desired range, and the theorem is proved. 



4.2 arccos and arccosh 

cos(z) = cosh(fz), so we can ask whether the corresponding relation for the 
inverse functions, arccosh(z) = zarccos(z), holds. This is known in [4] as the 
“couthness” of the arccos/ arccosh definitions. The problem reduces, using equa- 
tions (20) and (26), to 





Since log a = log b implies a = b (n.b. this is not true for exp, which is part of 
the point of this paper), this reduces to 




By lemma 1, the right-hand side reduces to y'^^(-I) i)+i°g( 2 )). Hence 

the two are equal if, and only if, the unwinding number is even (and therefore 
zero). This will happen if, and only if, arg < 0, i.e. 9z < 0 or 9z = 0 and 

z > 1. 
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4.3 arcsin and arctan 

The aim of this section is to prove the correct expression for arcsin in terms of 
arctan. We note that we need to add unwinding number terms to deal with the 
two cuts < —1, = 0 and > 1, 92 = 0. 

Theorem 2. 

arcsin 2 = arctan + 7 t/C(— log(l + 2 )) — 7r/C(— log(l — 2 )). (14) 

V 1 — 2^ 

We start from equations (19) and (21). Then 

2 . arctan = log (l + - log (l - i^=i=) 

+2,1K (log(l + - log(l - 

= l0g[z2 + \/l — 2^]^ 

+2„.X (^log(l + - log(l - 

= 2z arcsin( 2 ) 

—2'kUC ^21og(z2 + \/l — 2^)^ 

+2,.X (log(l + - log(l - 

The tendency for tC factors to proliferate is clear. To simplify we proceed as 
follows. Consider first the term 

JC [2 log(z2 + \/l — . 

For \z\ < 1, the real part of the input to the logarithm is positive and hence has 
argument in (— 7 t/2, 7t/ 2); therefore K. = 0. For \z\ > 1, we solve for the critical 
case in which the input to K, is —in and find only 2 = rexp(zTr), with r > 1. 
Therefore 

JC{2 log(z 2 + \/l - 2 ^)) = JC{- log(l + 2 )) . 

Repeating the procedure with 

/C ^log(l + iz / \/l — 2 ^) — log(l — iz / \/l — 2 ^)^ 

shows that /C yf 0 only for 2 > 1. Therefore 

JC ^log(l + iz/ \/l — z^) — log(l — izj \/l — 2 ^)^ = /C(— log(l — 2 )) 

and so finally we get 
2 

arctan —^=^= = arcsin( 2 ) — nlC{— log(l + 2 )) + 7t/C(— log(l — 2 )) , (15) 

Vl — 

and this cannot be simplified further. 
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5 The Unwinding Number: Insertion 

We have seen that the systematic insertion of unwinding numbers while applying 
many “simplification” rules is necessary for mathematical correctness. 

Unwinding numbers are normally inserted by use of equation (9) and its 
converse: 

log = log Zl - log Z 2 - 27t/C (log - log Z 2 ) ■ (16) 

Equation (10) may also be used, as may its close relative (also a special case 
of (16)) 

log — logz — 27 t/C (— logz) . (17) 

In practice, results such as lemma 1 would also be built in to a simplifier. 

The definition of 1C gives us 

log(e^) = z — 27tz/C(z), (18) 

which is another mechanism for inserting unwinding numbers while “simplify- 
ing”. The formulae for other inverse functions are given in appendix B. 

Many other “identities” among inverse functions require unwinding numbers. 
For example. 



arctan x + arctan y = arctan 



/ x-by \ 
\1 - xy) 



-b 7t/C (2z (arctan a; -b arctan y)) . 



6 The Unwinding Number: Removal 

It is clearly easier to insert unwinding numbers than to remove them. There are 
various possibilities for the values of unwinding numbers. 

— An unwinding number may be identically zero. This is the case in lemma 2 
and theorem 1. The aim is then to prove this. 

— An unwinding number may be zero everywhere except on certain branch 

cuts in the complex plane. This is the case in equation (10), and its relative 
log(l/z) = — log 2 — 2ttHC{— logz). A less trivial case of this can be seen in 
equation (14). Derive has a different definition of arctan to eliminate this, so 
that, for Derive, arcsin(z) = arctan^ This definition can be related to 

Derive 

ours either via unwinding numbers or via arctanU ) = arctan z. It is often 

Derive 

possible to disguise this sort of unwinding number, which is often of the 
form /C(— log(. . .)) or /C(logz), by resorting to such a “double conjugate” 
expression, though as yet we have no algorithm for this. Equally, we have no 
algorithm as yet for the sort of simplification we see in section 4.3. 
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— An unwinding number may divide the complex plane into two regions, one 
where it is non-zero and one where it is zero. A typical case of this is given 
in section 4.2. Here the proof methodology consists in examining the critical 
case, i.e. when the input to /C has imaginary part ±7t, and examining when 
the functions contained in the input to /C themselves have discontinuities. 

— An unwinding number may correspond to the usual -|-n7r: n G Z of many 
trigonometric identities: examples of this are given in appendix B. 

7 Conclusion 

Unwinding number insertion permits the manipulation of logarithms, square 
roots etc., as well as the cancellation of functions and their inverses, while re- 
taining mathematical correctness. This can be done completely algorithmically, 
and we claim this is one way, the only way we have seen, of guaranteeing math- 
ematical correctness while “simplifying” . 

Unwinding number removal, where it is possible, then simplifies these results 
to the expected form. This is not a process that can currently be done algorith- 
mically, but it is much better suited to current artificial intelligence techniques 
than the general problems of complex analysis. 

When the unwinding numbers cannot be eliminated, they can often be con- 
verted into a case analysis that, while not ideal, is at least comprehensible while 
being mathematically correct. 

More generally, we have reduced the analytic difficulties of simplifying these 
functions to more algebraic ones, in areas where we hope that artificial intelli- 
gence and theorem proving stand a better chance of contributing to the problem. 
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A Definition of the Elementary Inverse Functions 



These definitions are taken from [4]. They agree with [1, ninth printing], but are 
more precise on the branch cuts, and agree with Maple with the exception of 
arccot, for the reasons explained in [4] . 



arcsin z = —i log ^\/l — -I- iz^ . 



arccos(z) = — — arcsin(z) = - log 
2 i 



1 



1 + 2 



1-2 



arctan( 2 ) = — (log(l + iz) — log(l — iz)) . 

2i 



arccot 2 = — log 

2i 



z + i 



z — i 



= arctan ( - 

2 



arcsec( 2 ) = arccos(l/ 2 ) = — f log(l /2 + t\/l — 1 / 2 ^), 
with arcsec(O) = |. 



arccsc( 2 ) = arcsin(l/ 2 ) = — f log(z /2 + \/l — 1 / 2 ^), 
with arccsc(O) = 0. 

arcsinh( 2 ) = log (^z + \/l + 2 ^^ . 



arccosh( 2 ) = 2 log 



2+1 



2-1 



arctanh( 2 ) = - (log(l + 2 ) — log(l — 2 )) . 
arccoth( 2 ) = ^ (log(— 1 — 2 ) — log(l — 2 )) . 



arcsech( 2 ) = 2 log 



2 + 1 
2z 



1 - 2 
2z 



1 



arccsch( 2 ) = log - + \ / 1 + ( - 



(19) 

(20) 

( 21 ) 

(22) 

(23) 

(24) 

(25) 

(26) 

(27) 

(28) 

(29) 

(30) 
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B Formulae for Inverse Functions 



These formulae are taken from [11]. They make use of the secondary function 
csgn, which we define below in terms of /C and was first defined by 
Dr. D. E. G. Hare as the piecewise function on the right hand side®: 



csgn(z) = (_l)'C(2iog(D) 



+1 3?(2) > 0 or 3?(z) = 0; 9(z) > 0 
-1 3?(2) < 0 or 3t(z) = 0; 9(z) < 0 ’ 



arcsin(sin(z)) 



z — 2'kJC{zi) csgn(cosz) = 1 

7T — z — 27r/C(f(7T — z)) csgn(cos z) = — 1 ' 



arccos(cos z) 



z — 27 t/C(zz) csgn(sin z) = 1 
— z — 27r/C(— zi) csgn(sinz) = —1 ' 



arctan(tan z) = z + tt (/C(— zz — log cos z) — JC{zi — log cos z)) 



provided z yf | + zztt: n G Z. 



arcsinh (sinh (z ) ) 



z — 27 tz/C(z) csgn(coshz) = 1 

in — z — 2niJC{in — z)) csgn(coshz) = — 1 ' 



(31) 

(32) 

(33) 



(34) 



arccosh(cosh z) 



z — 2nJC{z) csgn(sinh z) cos(zz7r) = 1 

— z — 2niJC{—z) csgn(sinh z) cos(zz7r) = — 1 



where n = JC (log(cosh(z) — 1) + log(cosh(z) + 1)). 



(35) 



arctanh(tanh z) = z + in {JC{z — log cosh z) — /C(z — log cosh z)) (36) 

provided z yf |z + inn\ zz G Z. 



This function simplifies \/z^ to zcsgn(z). Dr. J. Carette observed that if we put ui = 
exp(27Tz/n), then the function defined by and sometimes abbreviated by 

C„{z), that generalizes csgn, is useful in simplifying (z")^/" (private communication). 
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Abstract A reliable symbolic-numeric algorithm for solving nonlinear 
systems over the reals is designed. The symbolic step generates a new 
system, where the formulas are different but the solutions are preserved, 
through partial factorizations of polynomial expressions and constraint 
inversion. The numeric step is a branch-and-prune algorithm based on 
interval constraint propagation to compute a set of outer approximations 
of the solutions. The processing of the inverted constraints by interval 
arithmetic provides a fast and efficient method to contract the variables’ 
domains. A set of experiments for comparing several constraint solvers 
is reported. 



Keywords: AI and symbolic mathematical computing, constraint solving, nonlinear 
system, symbolic-numeric algorithm, interval arithmetic. 



1 Introduction 

Symbolic-numeric algorithms (solvers) processing sets of formulas over the reals 
(constraints) have been widely studied in the last years. In this framework, the 
symbolic algorithms are mainly devoted to polynomial (or quasi-polynomial) 
systems, like Gaussian elimination. Simplex, Grobner bases [5], GAD [7], re- 
sultants [10], and triangular set-based techniques [2]. The numeric methods, 
like Gauss-Seidel, Newton-Raphson or optimisation techniques [15], compute 
sequences of approximate solutions or tightened variables’ domains until some 
convergence properties or required distances to solutions are verified. The combi- 
nation of both kinds of constraint solving techniques is a promising approach to 
prevent the drawbacks of each individual solver, namely the complexity and the 
lack of expressiveness of symbolic algorithms, the approximate nature of numeric 
solutions and the restricting convergence properties of numeric algorithms. 

The core constraint solver developed in this work is a numeric branch-and- 
prune algorithm [9, 17] iterating two steps: first, an interval-based pruning oper- 
ator associated with each constraint of the system to be solved discards from the 
variables’ domains some of the values that are inconsistent with the constraint 
(technique related to local consistency methods such as arc consistency [11]); the 
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domain modifications are then propagated to the other constraints for reinvo- 
cation of their pruning operator. When quiescence is reached, a bisection stage 
occurs — splitting of the domains — to separate solutions and obtain tighter do- 
mains that otherwise could have been obtained by constraint propagation alone. 
The use of interval arithmetic (lA) [13] permits the computation of outer approx- 
imations (supersets) of the relations defined by the constraints (solution set). Let 
us note that reliability is mandatory for various applications, for instance in or- 
der to ensure the physical meaning of a solution. Further, when techniques from 
interval computations are applied, the result of a procedure can be guaranteed 
also in the presence of rounding errors inherent in digital computations [1] . 

The symbolic part of the solving process is a preprocessing step implementing 
constraint factorization and inversion. The factorization of syntactically equiva- 
lent sub-expressions of a constraint permits us to partially tackle the dependency 
problem of lA: the multiple occurrences of a variable are considered as different 
variables during interval evaluation. As a consequence, interval computations 
often over-estimate the real quantities to be approximated. For this purpose, we 
define the cross nested form of a quasi-polynomial expression (polynomial where 
the coefficients can be complex expressions) , which is intuitively a kind of Horner 
form of a multivariate quasi-polynomial. It is computed by an algorithm that 
iterates the choice of a variable w.r.t. which the Horner form is computed, until 
reaching a fixed-point, i.e., no sub-expression can be factorized. With respect to 
the nested form [14] that aims at minimizing the total degree of an expression, 
the cross nested form aims at reducing the multiple occurrences of the variables. 

The inversion of a constraint w.r.t. a variable consists in generating a new 
(syntactically different) equivalent constraint {i.e. with the same solution set) 
where this variable is expressed according to the rest of the constraint. For in- 
stance, the constraint obtained from the inversion of — y = 2 w.r.t. x is 
X = f/T+y. The aim is to have a form of constraint that can be processed 
easily by lA to contract the domain of the considered variable. The associate 
pruning operator first approximates the range of the right-hand expression (using 
lA) and then interprets the relation symbol so as to keep all the solutions of the 
constraint (in the case of equality, the domains are intersected). The inversion 
of arbitrary constraints extends the framework presented in [6, 16] to process 
primitive constraints (constraints with one operation at most). The main idea 
is to introduce a new operation symbol to invert each real operation in the con- 
straint, associated with a method to evaluate it over the intervals. To illustrate 
this process, one may cite the Gauss-Seidel method that inverts each row (a 
constraint) of a linear system with respect to a column (a variable) to contract 
the variables’ domains. 

The contribution of this paper is twofold: the definition of the cross nested 
form and the design of a branch-and-prune algorithm based on constraint in- 
version and interval arithmetic. Some experimental results show the algorithm’s 
efficiency as well as the possibility for solving difficult nonlinear systems mod- 
elling real applications. 
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The outline of this paper is as follows: Section 2 introduces some notions 
from lA and presents the cross nested form. Constraint inversion is described in 
Section 3. The constraint solving algorithm is devised in Section 4. The exper- 
imental results are discussed in Section 5, and some conclusions are stated in 
Section 6. 



2 Interval Arithmetic 

Interval Arithmetic (lA) has been designed by R. E. Moore [13] for automatically 
computing roundoff error bounds of numerical computations. In this paper, it 
is used to compute a superset of the range of a real function over a domain. In 
this section, some notions from lA are presented, and the cross nested form of 
a real function is defined. 



2.1 Preliminaries 

Let K denote the set of real numbers. Let S = (K, IF, {=, ^, ^}) be a real- 
based structure, where IF is a set of operation symbols, and consider a set of 
real- valued variables. A term is a syntactic expression built from constants, 
operations and variables. Let Vf denote the set of variables occurring in a term 
/. A eonstraint is a first order formula built from terms and relation symbols — a 
nonlinear equality/inequality over the reals. Given an n-ary constraint c, let pc 
denote the relation defined by c in the standard interpretation of S, and Vc the 
set of variables occurring in c. Two constraints c and d are said to be equivalent 
if Pc = Pc'- Let c= c' denote the equivalence of c and c'. 

Given a, 6 S K, the set of reals / = {a; S K | a ^ a; ^ 6} is an interval, 
denoted [a, 6] or [/,/]. Practical experiments are often based on the set I of 
machine-representable intervals whose bounds are floating-point numbers. Let 
U denote the set of unions of intervals from I. Given a subset p of K, let Hull(p) 
denote the smallest element of I (with respect to set inclusion) enclosing p. 

The main notion from lA is the notion of interval extension of a real func- 
tion/relation [13, 17]. 

Definition 1 (Interval extension) . An interval extension ( also called interval 
form or inclusion function) of a function / : K" ^ K zs a function F : I” ^ I 
such that for any tuple of intervals (/i, . . . , /„) in the domain of f , we have the 
following property: 

j • ■ ■ j ^n) I dcii ^ , ^a<Yi G In\ G F{^I\ , ... , Iji ) 

An interval extension of a relation p C K" is a relation F C I” such that for 
any tuple of intervals (/i, . . . , /„), we have the following property: 



G I\, . . . , dftjj G In-j (^ 1 5 ■ ■ ■ 5 ^n) ^ P (h , • - • , Ln) ^ F 
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In other words, an interval extension represents a superset of the associated real 
quantity. This property is commonly called the inclusion property or fundamen- 
tal theorem of lA. lA operations are set theoretic extensions of the real ones; 
given /, J G I and an operation o, we have: 

7oJ = Hull({ao6 | 3a G I, 3b G J}). 

In practice, these operations are evaluated by floating point computations over 
the bounds of intervals; for instance, we have [a, b] + [c, d] = [a + c,b + d\, and 
[a, 6] — [c, d] = [a—d, b—c], provided the resulting bounds are rounded towards the 
infinities. In such an approach, there are many ways to extend a real function. 

The natural interval extension is a componentwise extension of a term repre- 
senting the real function: each constant a of the term is replaced with Hull({a}), 
each variable with an interval variable, and each operation by the correspond- 
ing interval one. Let us remark that the natural extensions obtained from dif- 
ferent terms of one function are generally different. For instance, let us con- 
sider f{x) = — X and g{x) = x(x — 1) and their natural extensions F 

and G. The range of / (and g) is [—0.25,90] over the domain [0,10], though 
F{[0, 10]) = [—10, 100] and G([0, 10]) = [—10, 90]. The inclusion property is pre- 
served, but the over-estimation of the range cannot be anticipated. This weakness 
of lA is known as the dependency problem, which comes from the decorrelation of 
the multiple occurrences of one variable during interval evaluation. Nevertheless, 
there is one situation when this problem does not happen, when all variables oc- 
cur only once in the term (theorem from Moore). The next section will present 
a new kind of interval extension, based on the factorization of polynomial terms 
in order to decrease the number of multiple occurrences of the variables. 

Let us consider a constraint c : / cc g. An interval extension of pc can be 
obtained from some extensions of / and g, and the interpretation of the relation 
symbol ixi. Given I,J G I, the relation symbols are interpreted as follows: we 
have I = J if lr\J^0, and / ^ J if / ^ J. Given an interval extension F 
(resp. G) of / (resp. g), we define an interval extension Fc of pc as follows: 

Te = {(Ji, ...,/„) G r I F{h,. ..,/„)« G(/i, . . .,/„)} 



2.2 The Cross Nested Form 

The dependency problem of lA requires the use of interval extensions containing 
few multiple occurrences of variables. For this purpose, we define the cross nested 
form, a new kind of interval extension of a real function based on the factorization 
of some common sub-expressions of a term. 

Table 1 presents Algorithm CrossNested that partially factorizes a real term 
p : CiAj, where for all f G {1, . . . , k}, ei is a term and Xi is either 1 or a 

product of factors x'^, x being a variable and d a positive integer. The result is a 
new term for the real function defined by the initial term. The two-step procedure 
iterates: 1. the choice of a variable that occurs at least in two sub-terms Xi of p 
(otherwise, the term cannot be factorized), and 2. the computation of the Horner 
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form of p w.r.t. the selected variable. We then define the cross nested form as 
follows: 

Definition 2 (Cross nested form). Let p be a term representing a real func- 
tion / : K" — > K, and fix a choice procedure in Algorithm CrossNested. The 
cross nested form ofp w.r.t. this choice procedure is the natural interval extension 
of the term CrossNested(p). 

Since the Horner form of an univariate polynomial is optimal for interval eval- 
uation, we expect that the cross nested form of a multivariate pseudo-polynomial 
(that is also a polynomial if the expressions are restricted to real numbers) is 
a good approximation for interval evaluation of the range of a real function (see 
Example 1). 

Example 1. Let p : x^y x^yz xw — xwy 2yz — yw be a term. If the variables 
y and 2 are seen as constants, then the Horner form of p (w.r.t. x) is 2yz — yw -\- 
x{w — wy x{yz xy)). Given the choice procedure that selects the variable 
that occurs the most in the term, the resulting term from CrossNested(p) is 
y{2z — w) a;(w(l — y) xy{z x)). The number of occurrences of variables is 
10 for CrossNested(p), and 14 for p. 



Tablel. Computation of the cross nested form of a real function. 



function CrossNested(term p : CiW) : term 

begin 

X := {x \ 3(1 ^ i < j ^ k), X G Vx^ , x G Vx^ } 
ii X = 0 
then return p 
else Choose a; in X 

return Horner (p, a;) 
fi 
end 

function Horner(term p : CiW, variable x) : term 

begin 

d := min({di £ N+ | 3i £ {1, . . . , fc}, di is the degree of a; in Xi}) 
% d is possibly equal to 0if3iG {!,..., k}, x ^ Vxi 
J := {i £ R I £ {1, . . . , fc}, a; occurs in Xi with degree d} 

Let p be 3- r) % Faetorization of p by x'^ 

s CrossNested(^^g j Ci—f) % Let us remark that x ^Va 
if r 7^ 0 

then g := Horner (r, a:) 

return a;‘*(CrossNested(g -|- s)) 
else return x'^s 
fi 
end 
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The cross nested form should be compared with the nested form proposed by 
V. Stahl [14]. Let us define the total degree of a product x • • • x as the 
sum di~\ \-dn, and the common total degree of two products x • • • x x'^" 

and x • • • x as the sum Yhxi=yj d'^), i.e., the total degree of their 

common sub-terms. The nested form of a term p corresponds to the factorization 
at a time of two sub-terms eiXi and ejXj of p such that the common total degree 
of Xi and Xj is maximum. This process is iterated until reaching a fixed-point. 
In general, there is no guarantee that both forms can be compared, while the 
total degree of the nested form is often smaller than the one of the cross nested 
form. Nevertheless, we have the following result: 

Proposition 1. Given the choice procedure that selects the variable that occurs 
most often in the term, the number of products Xi^ x • • • x xf," in the term 
CrossNested(p) is less than or equal to the number of products in the nested 
form. 

Let us illustrate both forms of a real term in Example 2. 

Example 2. In this example, the difference between the two forms concerns the 
multiple occurrences of Variable y in the nested forms, though it occurs only 
once in the cross nested forms. 

Input polynomial Nested form Cross nested form 

x^yEx^Ey x‘^{xy+l) + y y{x^ + l) + x^ 

xyz + xy + yz + xz xy{l + z) + z{x + y) y{x{l + z) + z) + xz 



3 Constraint Inversion 

The inversion of a constraint w.r.t. a variable is a symbolic procedure that com- 
putes an equivalent constraint whose left-hand term is reduced to this variable. 
In Section 4, each inverted constraint will be used for contracting the domain of 
the considered variable. 

In the following, let us consider an occurrence of a variable x appearing in a 
constraint c. 



3.1 Preconditioning 

The constraints are ordered w.r.t. the numbers of operation symbols contained 
in their left-hand terms, as follows: 

/ ixi /' ixi' g' op(/) ^ op(/') 

Let us remark that is Noetherian. 

In order to simplify the presentation of the inversion procedure, each con- 
straint c to be inverted w.r.t. a variable x is preliminary rewritten in an equiv- 
alent equality constraint x = f. Two new operation symbols ge : IR ^ IR and 
le : IR ^ IR are introduced to remove the inequality relation symbols, such that 
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for every constraint c : f ^ g and c' : f ^ g' , we have c = (f = ge((/)) and 
c' = (/' = le((/')). Their interpretations will be given later by means of lA. The 
following table sums up the possible preconditioning operations, where f[x] is 
the term containing the selected occurrence of x: 



/N ^ 9 




/N 


= ge(ff) 


/N ^ 9 




/N 


= le(5) 


9 ^ f[x] 




/W 


= le(5) 


9 ^ f[x] 




/W 


= ge(ff) 


/N = 9 




/W 


= 9 


9 = f[x] 




/W 


= 9 



Let prec(c, x) denote the constraint resulting from the preconditioning of c 
w.r.t. x. We have the following property: 

Property 1. Constraint prec(c, a;) is equivalent to c. 

Finally, let us remark that systems of equalities can be more easily simplified 
than inequalities. Further research may consider the simplication of equalities 
obtained from preconditioning. Nevertheless, let us note that new simplication 
rules have to be designed for processing the operation symbols ge and le. 



3.2 Inversion 

Let c : / = (/ be an n-ary constraint. Table 2 describes the elementary operations 
to invert c. Indeed, either the inverted constraint can be expressed w.r.t. the 
existing symbols from the real-based structure, or these symbols are no longer 
enough. As a consequence, a set of new operation symbols is introduced, and 
defined so as to guarantee the constraints’ equivalence. 



Table2. Elementary operations for inverting a constraint. 



Rule 


Constraint c 


Inverted constraint inv(c, a;) 


1 


f[x] + g = h 


f[x] = 


h-g 


2 


f[x] -g = h 


f[x] = 


h + g 


3 


g + f[x\ = h 


f[x] = 


h-g 


4 


g - f[x\ = h 


f[x] = 


g-h 


5 


f[x\ X g = h 


f[x] = 


h^g 


6 


f[x]/g = h 


f[x] = 


h*g 


7 


g X f[x] = h 


f[x] = 


h^g 


8 


g/f[x] = h 


f[x] = 


g^h 


9 


exp(/[a;]) = h 


f[x] = 


Log{h) 


10 


log(/W) = h 


f[x] = 


exp(/i) 


11 


f[x]'^ = h, n even 


f[x] = 


r„{h) 


12 


/[a:]" = h, n odd 


fix] = 




13 


X = h 


X = h 
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To sum up, these new symbols are: 4- to invert the real multiplication (Rules 
5 and 7), * for the division (Rules 6 and 8), Log for the logarithm (since log : 
^ IR is only defined on K+ ), and r„ for the power of n (in order to avoid the 
disjunction induced by the n-th root with n even). 

Proposition 2. For every constraint c and every variable x occurring in c, we 
have either inv(c, a;) ^ c or inv(c, a;) is syntactically equivalent to c. Furthermore, 
inv(c, a;) is equivalent to c. 

Proof. Since c : f g =4 c' : f tx' g' means that op(/) ^ op(/'), it is clear that 
for all rules, except for the thirteenth one, we have inv(c, x) c. We then remark 
that Rule 13 generates an equivalent constraint. Moreover, by simply rewriting 
Pc we conclude that Rules 1, 2, 3, 4, 10, 12 preserve the equivalence of c and 
inv(c, a;). For instance. Rule 1 is rewritten as follows: 

Pc = {{xi , . . . ,a;„) I /[a;](a:i, . . . ,Xn) + g{xi , ... ,a:„) = h{xi , . . . ,a;„)}, 

where, with a little misuse of notations, f{xi , . . . , Xn) stands for the function of 
expression / with parameters x\, . . . , Xn- This implies that 

Pc = {{xi , . . . ,a:„) I /[a:](a;i, ... ,a;„) = h{xi, ... ,a:„) - g{xi, . . . ,a:„)}. 

The relation pc then corresponds to a new relation pc> where d is f[x] = h — g. 
In addition, since the symbols -L, r„, * and Log are defined so as to keep the 
equivalence property, it follows that for every constraint c. Rules 5, 6, 7, 8, 9, 
and 11 generate an equivalent constraint inv(c, a;). This ends the proof. □ 

We then define the inversion of a constraint c as the computation of a se- 
quence of equivalent constraints by iteratively applying one elementary inversion 
operation, until generating two consecutive constraints that are syntactically 
equivalent. The procedure always terminates due to the Noetherian property of 
ordering =^. Let lnverse(c, a;) denote the last constraint from the sequence. By 
Property 1 and Proposition 2, we have the following result: 

Proposition 3. For every constraint c and every variable x occurring in c, 
lnverse(c, a;) is equivalent to c. 

The following example illustrates the computation of an inverted constraint: 



Example 3. Let c : 2xy ^ {x + 1)^ — 1 be a constraint. The inverted constraint 
with respect to y is: 

y = ge((a:-b 1)^ - 1) -b (2a;) 

and is unique. By contrast, there are two different inverted constraints with 
respect to the two occurrences of x: 

f x = ge((x + iy - 1) -L (2y) 

\ a; = r2(le(2a:y) -I- 1) - 1 

In practice, it is a challenge to choose correctly the tighter of the two for interval 
evaluation. 
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3.3 Interval Extension and Tightening 

Let us consider a constraint c(a;i, . . . , a;„), an integer k € n}, and an 

occurrence of Xk in c. Let i^kiPc) denote the projection of pc over Variable Xk^ 
i.e. the set {ak € IR | Vi € {l,...,k— l,k+l,...,n}, € IR, (ai , . . . ,an) € Pc\- 

Let I = (/i, . . . , In) be the variables’ domains, and Xk = f{xi, . . . , x„) the 
constraint obtained from the inversion of c w.r.t. Xk- If one is able to compute an 
interval extension F of /, the domain of Xk can be contracted by the following 
operation: 

Ik := Hull(/fcnF(/i,...,7„)) 

while preserving all the elements of pc included in the variables’ domains. This 
completeness property is guaranteed by Proposition 4. 

Proposition 4. We have TTk{pc H I) C Hull(/fe n F{Ii, . . . , In))- 

Proof. Since c and lnverse(c, a;^) are equivalent, we have: 

Pc n I = {(ai, . . . , a„) G I I Ofc = /(ai, . . . , a„)}. By inclusion property of F, it 
follows: Pc n I C {(ai,...,a„) G I | G F(/i, . . . , Since the projection 
operation preserves an inclusion relation, we have: Tik{pc H I) C {ak G /fc | flfe G 
F{h,. . Since [ak G 4 | G F(/i, . . .,/„)} = 4 C F{h,. ..,In) and 

Hull is extensive, we have: nk{pc H I) C Hull(4 H F{Ii, . . . ,In)), that ends the 
proof. □ 

The tightening operation assumes the computation of an interval extension of 
the right-hand term of each inverted constraint. We propose to use the natural 
interval extension as defined in Section 2, given the following interval opera- 
tions associated with the new symbols introduced for the needs for constraint 
inversion. 

The symbols -G and * are respectively interpreted as the extended division 
and the usual multiplication of lA. We then define: 

r„([a, b]) = [— f/b, — l/a] U [ if a > 0 and n is even 

= [0, f/b] if a ^ 0 ^ 6 and n is even 

= 0 otherwise 

< Log([a, 6]) = log([a, 6]n]0, -l-oo[ ) if 6 > 0 
= 0 otherwise 

le([a, 5]) = ] — oo, b] 

.ge([a,6]) =[a,-boo[ 

The inclusion property of these interval extensions can be easily verified. To 
end the section, let us illustrate the whole process from the inversion to the 
tightening operation by an example. 
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Example 4- Let c : (x + 1)^ = y be a constraint and [—10, 10], [1, 8] the domains 
of X and y respectively. The constraint obtained from the inversion of c w.r.t. x 
is a; = ^ — 1. The tightening operation computes: 

Hull([-10,10]n([l,2]-l)) = [0,1], 
that is the new domain of x. 

4 Constraint Solving 

The constraint solving algorithm is presented in Table 3. The cross nested forms 
of all the constraints’ expressions are first computed, resulting in the set of 
constraints C . The rest of the process is a classical branch-and-prune iteration [9, 
17]. The computation step consists of four operations: 

1. A vector of domains J is extracted from the list of vectors S to be processed; 

2. A first-order Taylor expansion of all the equations from C [i.e. constraints 
that have been factorized) is generated, followed by a preconditioning of the 
new linear system, as proposed in [8, 17]. The new set of constraints C 
is obtained from the inversion of all the constraints from C and the linear 
system w.r.t. all the variables occurring in it. 

3. The chosen domains J are contracted w.r.t. the set of inverted constraints 
C"', by Algorithm Prune, a classical AC3-like constraint propagation algo- 
rithm [11, 3]. The tightening operation defined in Section 3 is enforced over 
the inverted constraints taken as input. The output corresponds to a fixed- 
point, when no further contraction of domains happens. 

4. If the new vector K is precise enough (the width of each component interval 
is less than e), then it is added in the list of output vectors S'/; otherwise, it 
is split (generally, in 2 or 3 parts in one direction) and each new sub-vector 
is added to the list S. 

The result is a set of domains’ vectors S / such that every solution of the initial 
problem appears at least in one vector from this set (completeness property). 
This property follows directly from Proposition 4. Moreover, the algorithm ter- 
minates in finite time (since every step is contracting and the set of intervals is 
finite) . 

5 Experimental Results 

In this section, two experiences are reported: the comparison of both nested and 
cross nested forms w.r.t. the natural form, and the comparison of the branch-and- 
prune algorithm with two state-of-the-art systems for solving nonlinear systems: 
Numerica [17, 18] implementing interval analysis and constraint satisfaction tech- 
niques, and PHC [20] based on homotopy continuation (let us remark that PHC is 
restricted to systems of polynomial equations). 
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Tables. The constraint solving algorithm. 

BranchAndPrune(C: set of constraints; I : I"; e : RJ) : set of I" 
begin 
C’ ■- 0 

for each constraint / cxi g (E C do 

C' := C' U {CrossNested(/) txi CrossNested(g)} 
od 

S ■- {1} Sf ■- 0 
while S' 7 ^ 0 do 
Get J from S 

C" InverseAll(C^ U Taylor(C^ J)) 

K := Prune(C", J) % Contraction of 3 
if K 7 ^ 0 then 

if the precision of K is greater than e 
then Sf := Sf U {K} 

else S := S U Bisect(K) % Partitioning o/K 
fi 
fi 
od 

return Sf 
end 

Inver se All (C: set of constraints) : set of constraints 
begin 

C := 0 

for each constraint c € C do 

c' := Preprocess (c) % Preconditioning step 
for each variable occurring in d do 

Select in c' an occurrence x of the chosen variable 
C := C U {lnverse(c', ®)} % Inversion of c' w.r.t. x 
od 
od 

return C' 
end 

Prune(C: set of constraints; I : I") : I" 
begin 

S := C % S is the propagation list 
while S 7 ^ 0 and I 7 ^ 0 do 

Get c : Xk = f{xi, . . . , Xn) from S 
J := Hull(/fc nT(I)) % Tightening operation 
If J d Ik then 

S := S L) {c' G C \ Xk € Vc'} % Constraint propagation 
Ik ■■= J 

else S ~ S \ {c} 
fi 
od 

return I 



end 




138 



Marline Ceberio and Laurent Granvilliers 



Table4. Expression of Problem Seyfert-filter. 



' m2m4m6 = 0.01 

ahmi = 7/500 
+ ml = 2/25 

6 ^ + mr = 37/50 

m| + m 3 + m| + m§ + mg = 0.9401 

m|m| + m|m| + m|mg + miml + m§m| + m\m\ = 0.038589 

mimsmsmT — bmimsmg + abm 2 mg — am2m<irm = —0.00081 

6 mim 2 m 3 m 4 + am4msmQmr — ahni^mA — abm^rriQ — 0.39/250 

rn^m^ + rn^rn^ — 26m5mgm7 + m%mj + m\mj + b^m§ + b^mg + 6 ^m| = 2.7173/4 
^a,b G [— 10 ®, + 10 ®] mi, . . . , mr e [— 1 , + 1 ] 

The results for Numerica have been extracted from the book [18]. All other 
results have been obtained on the same machine, namely a Sun Sparc Ultral 
(166MHz). The set of benchmarks originate from both computer algebra and 
interval analysis communities [4, 17, 19]. The expression of Problem Seyfert-filter, 
modelling a filter design problem, typically illustrates the kind of system to be 
tackled, and is given in Table 4. 

The comparison of the natural, nested and cross nested forms is done by en- 
forcing the branch-and-prune algorithm on four benchmarks, i.e., the problems 
from our database for which the differences between these three forms are signif- 
icant. The results are reported in Table 5. Each column corresponds to the use 
of the corresponding interval form in Algorithm Preprocess. The computation 
times for the first solution have been collected and the reported figures are the 
ratios w.r.t. the solving time when the natural form is used. 



Tables. Comparison of interval forms. 



Benchmark 


Natural form 


Nested form 


Cross nested form 


Seyfert-filter 


100% 


56% 


36% 


Rou I Her- robot 


100% 


18% 


17% 


Caprasse 


100% 


50% 


50% 


Czappor-Geddes 


100% 


80% 


60% 



The conclusions are the following: the nested form is more accurate than the 
natural form, as shown in [14]; as a consequence, the computation is faster since 
less terms are evaluated and less bisections are performed; the cross nested form 
seems to be as accurate as the nested form for all problems, and more efficient 
for Seyfert-filter, whereas we remark that only two constraints out of nine have 
different nested and cross nested forms. 

The computation times from the aforementionned constraint solvers are given 
in Table 6: BaP 1 (resp. BaP) is the computation time for finding the first so- 
lution (resp. all solutions) using Algorithm BranchAndPrune. The results from 
Numerica and PHC are also reported. A blank cell indicates that a result is not 
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available. A brief description of the problems is also given (respectively, name, 
number of variables, and number of real solutions). 

These results show the efficiency of the new algorithm, combining a very 
simple interval tightening procedure with fast (quadratic in the number of vari- 
ables) symbolic transformations of polynomials. With respect to continuation 
homotopy, and more generally to symbolic algorithms, such a symbolic-numeric 
algorithm may compute a solution quickly, since no generation of a starting so- 
lution/system is required before the search for the solutions. This is of particular 
importance since in practice, a solution satisfying some requirements is often de- 
manded(this is the case for Problem Seyfert-filter, where the solution is required 
to have a physical meaning). In addition, let us note that the cross nested pre- 
processing time is negligible and is kept in reasonable bounds w.r.t. the numeric 
solving time when the number of variables increases. 



TableG. Comparison of constraint solving algorithms (times in seconds). 



Benchmark 


Var 


Sol 


BaP 1 


BaP 


Numerica 


PHC 


Nbody 


6 


12 


120.10 


1185.00 




508.60 


Dessin-d'enfant 1 


8 


6 


111.20 


1047.50 




271.65 


Seyfert-filter 


9 


128 


1.75 


328.85 




22824.20 


Noonburg-network 


5 


11 


0.05 


80.55 




261.25 


Bellido-kinematics 


9 


8 


3.50 


81.55 




689.30 


Roui Her- robot 


9 


24 


7.20 


53.50 




6640.55 


Dessin-d'enfant 2 


10 


3 


8.00 


57.50 




181.40 


Neurophysiology 


8 


8 


6.25 


46.15 


108.00 


68.85 


Kinematics 2 


8 


10 


0.40 


19.20 


243.30 


624.80 


Rose 


3 


3 


7.80 


15.70 




126.25 


Katsu ra-magnetism 


6 


12 


0.20 


7.20 




14.50 


Ku 


10 


2 


0.75 


5.70 




3.80 


Caprasse 


4 


18 


0.15 


5.00 


21.80 


31.90 


Trinks 


6 


2 


0.45 


3.00 




6.25 


Wood-function 


4 


3 


0.60 


2.10 




2.00 


Sendra 


2 


6 


0.10 


1.25 




17.65 


Brown 


5 


2 


0.10 


0.85 


2.90 


0.70 


Kinematics 1 


12 


16 


0.05 


0.75 


7.20 


6929.55 


Czapor-Geddes 


3 


2 


0.30 


0.75 




1.10 


Cyclohexane 


3 


16 


0.10 


0.75 


1.60 


3.65 


Cox-Little-0 'Shea 


3 


2 


0.05 


0.15 




51.15 



6 Conclusion 



A symbolic-numeric algorithm for solving general nonlinear systems has been de- 
vised. It combines a preprocessing step of the constraint’ expressions to simplify 
it w.r.t. interval arithmetic, with a numeric branch-and-prune iterative process 
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to derive a set of precise interval vectors of domains enclosing the real solutions. 
The contraction of domains is based on the inversion procedure of a constraint 
to generate a new equivalent constraint with respect to which an efficient, fast 
tightening operation over intervals can be enforced. A set of experimental results 
from a prototype are reported, as well as comparisons with other systems. 

Some directions for further research are sketched. The nested and cross nested 
forms must be clearly compared. In particular, what are the situations where 
one can choose the tighter according to ad hoc criteria? This is also closely 
connected with the use of particular orderings of variables. The combination 
of other interval extensions, such as the Bernstein form and the Taylor form 
of order k ^ 2 [12], is a promising approach to design more precise pruning 
algorithms. In this framework, some heuristics have to be developed in order to 
prevent unnecessary/redundant computations. 

Acknowledgements. We are grateful to Frederic Benhamou and Eric Monfroy 
for interesting discussions on these topics. 
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Abstract. In this paper, we propose a strategy language for designing 
schemes of constraint solver collaborations: a set of strategy operators 
enables one to design several kinds of collaborations. We exemplify the 
use of this language by describing some well known techniques for solv- 
ing constraints over finite domains and non-linear constraints over real 
numbers via collaboration of solvers. 



1 Introduction 

In constraint programming, the programming process consists of formulating 
problems with constraints. Solutions of these so called Constraint Satisfaction 
Problems (CSPs) are generated by solvers. Numerous algorithms have been de- 
veloped for solving CSPs and the resulting technology has been successfully 
applied for solving real-life problems. The design and implementation of these 
constraint solvers is generally an expensive and tedious task. Thus, the idea of 
reusing existing solvers is very interesting, but it also implies that we must have 
some tools to integrate them. Even more important, considering that some prob- 
lems cannot be tackled or efficiently solved with a single solver, we definitively 
realize the interest of integrating and making cooperate several solvers [19, 4, 
13, 20, 18]. This is called collaboration of solvers [15]. In order to make solvers 
collaborate, the need of powerful strategy languages to control their integration 
and application has been well recognized [16, 17, 1]. 

The existing approaches are generally not generic: they consider fixed do- 
mains (linear constraints [4], non-linear constraints over real numbers [18, 13, 
3]), fixed strategies, or fixed scheme of collaboration (sequential [18, 3], asyn- 
chronous [13]). In the language BAL|, collaborations are specified using control 
primitives and the constraint system is a parameter. Although BAL| is more 
generic and flexible, the control capabilities for specifying strategies are not al- 
ways fine enough [17]. In the system COLETTE [7, 8], a solver is viewed as 
a strategy that specifies the order of application of elementary operations ex- 
pressed by transformation rules. 
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Extending ideas of BAL| and COLETTE, we consider collaborations of solvers 
as strategies that specify the order of application of component solvers. In [9], we 
propose a strategy language for designing component or elementary constraint 
solvers and we exemplify its use by specifying several solvers (such as solvers 
for constraints over finite domains and real numbers). In this paper, we present 
the application of our language for prototyping constraint solving schemes via 
collaboration of solvers. 

The main motivation for this work is to propose a general framework in which 
one can design component constraint solvers as well as solver collaborations. 
This approach makes sense since the design of constraint solvers and the design 
of collaborations require similar methods (strategies are often the same: don’t- 
care, fixed point, iteration, parallel, concurrent, ...). In other words, we propose 
a language for writing component solvers and designing collaborations of several 
solvers at the same level. 

This paper is organized as follows: Section 2 presents basic definitions and 
notations. In Section 3, we present an overview of our strategy language whereas 
in Section 4 we detail its basic operators. In Section 5, we use our language for 
solving constraints over finite domains and real numbers via the collaboration 
of several solvers. Finally, we conclude in Section 6. 



2 Definitions 

Definition 1 (Constraint Systems and Constraint Solvers). A constraint 
system is a 4~tuple {S,'D,V, C) where S is a first-order signature given by a set 
of function symbols Ts and a set of predicate symbols Vs, T> is a S -structure 
(its domain being denoted by \T>\), V is an infinite denumerable set of variables, 
and L is a set of constraints: a non-empty set of (U, V)-atomic formulae, called 
atomic constraints, closed under conjunction and disjunction. 

We denote by _L the unsatisfiable constraint and the true constraint by T. 
The set of atomic constraints is denoted by Cai- An assignment is a mapping 
a : V — > |I?|. The set of all assignments is denoted by ASSi^. An assignment a 
extends uniquely to an homomorphism a : T(U,V) —> \T>\. The set of solutions 
of a constraint c G £ is the set Solx>{c) of assignments a G ASSi^ such that a{c) 
holds. A constraint c is valid in T> (denoted by I? |= c) if SoI-d{c) = ASSi^. We 
use Var(c) to denote the set of variables from V occurring in the constraint c. 

Given a constraint system (27,T>, V,£), a solver is a computable function 
S : £ ^ £ satisfying the correctness and completeness properties, i.e., VC G 
£, Soh{S{C)) C Soh{C) and Soh{C) C Soh{S{C)). We extend S' to a 
constraint system {S, V, V, £'), where £ C £! , in the following way: V C G £'\£, 
S(C) = C. We say that a constraint C is in solved form with respect to S if 
S(C) = c. 

In order to be able to manipulate specific parts of a constraint, we introduce 
the notions of syntactical form and sub- constraint. 
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Definition 2 (Syntactical Forms and Sub-constraints). We say that C is 
a syntactical form of C , denoted by C ~ C , if C = C modulo the associativity 
and commutativity of A and V, and the distributivity of A onV and ofV on A 
We say that C G C is a sub-constraint of C, denoted by C^c '\ > */•' 

- C = C 

- or 3Ci G £, w G {A, V}, C = CiwC' 

- or 3Ci G £, w G {A, V}, C = C'wCi 

- or 3Ci, C 2 G £, w G {A, V}, C = C'iwC '2 and (Ci^c'] or C2[c']) 

A couple (C"', C) such that C" is a sub-constraint of C and C" « C is called 
an applicant of C. We denote by ST{C) the finite set of all the syntactical forms 
of a constraint C: ST{C) = {C'\ C ~ C}^. We denote by CA the set of all the 
lists of applicants, and by CC the set of all the lists of constraints. Generally, 
we will use LA (respectively LC) to denote a list of applicants (respectively 
constraints). We denote by V{C x C) the power-set of all the sets of couples 
of constraints. Atom{C) denotes the set of atomic constraints that occur in C: 
{c|c G CAt and C\c\}. 

Finally, in order to explicitly handle sub-parts of a constraint, we define the 
notions of filter to select specific parts of a constraint, and sorter to classify the 
elements of a list w.r.t. a given order 

Definition 3 (Filters and Sorters). Given a constraint system {S,T>,V,C), 
a filter 4> is a computable function <j) : L V{L x L) such that (j>{C) = 
{(C/i, Ci), . . . , (C/„, C„)} for all C G C, where each Ci is a syntactical form 
of C and Cfi is a sub- constraint of Ci. 

A sorter Sorter, w.r.t. a partial order A, is a computable function Sorter : 
A xV{C X C) ^ CA such that V{(C/*i , Q J, . . . , (C'/i„ , C'*„)} G V{C x C): 

1. Sorter {C,, {(C/,,,QJ, . . . , (C/,„,a„)}) = \{Ch,C ^), . . . , (C/„,C'„)] 

2. Vfc G [1, . . . , n], 3j G [1, . . . , n], Cfi^ = Cfk and Ci^ = Ck 

3. Vj G [1, . . . , n - l],Cfj < Cfj+i 

The elements of (j>{C) are called candidates. We define the filter Id which 
returns the initial set of constraints and the order None which returns the initial 
list of candidates. Considering the filters <f)i and 4>2 on {S,V,V,C), then ^i ;<))2 
defined by 4)i{C) n (j) 2 {,C) is also a filter on (A, V, V, C) for all C G C. 

3 An Overview of the Strategy Language 

Most of the application mechanisms that we use in our strategy language are 
based on the same technique when applied to a constraint C: 

^ We consider that “=” is purely syntactic. 

^ The ACD theory defines a finite set of quotient classes that we can effectively filter. 
® These transformations are normally hidden in existing solvers. In [9], we detail ex- 
amples of the definition of filters and sorters. 
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1. A set SC of candidates is built using the filter (f) on C. 

2. The set SC is sorted using the partial order We obtain LC, a sorted list 
of candidates. 

3. The solver S is applied to one (e.g., the “best” w.r.t. or several elements 
of LC. 

4. Each occurrence of the sub-constraint (s) modified by S are replaced in their 
corresponding (w.r.t. candidates) syntactical form of C. 

The idea behind this scheme can be better understood in the following ex- 
ample. Suppose we are given the CSP over finite domains: 

a; G [1, . . . , 10] A y G [1, . . . , 5] A x>y 

In order to find a solution we can carry out enumeration as follows: 

— We first filter domain constraints in order to obtain a set of candidates: 

{(a; G [1, . . . , 10], a; G [1, . . . , 10] A y G [1, . . . , 5] A a; > y), 

(y G [1,...,5], a; G [1,...,10] A yG[l,...,5] A a; > y)} 

— If we want to use the minimum domain criterion, a sorter will return the 
following sorted list of candidates: 

[(y G [1,...,5], a; G [1,...,10] A yG[l,...,5] A a; > y), 

(a; G [1, . . . , 10], a; G [1, . . . , 10] A yG[l,...,5] A a: > y)] 

— Applying a solver to split the “best” domain constraint we obtain: 

yG[l,...,2] V yG[3,...,5], a: G [1, . . . , 10] A yG[l,...,5] A x>y 

— After replacing the original constraint in the corresponding syntactical form 
we finally obtain: 

a: G [1, . . . , 10] A (y G [1, . . . , 2] V y G [3, . . . , 5]) A x>y 

This syntactical form is equivalent to the original set of constraints and once we 
activate operators properties we could continue the solving process. 



4 The Strategy Language 

Now we briefly present several application mechanisms to apply solvers to con- 
straints. We assume that a solver is applied only once to a given set of constraints. 
In the following, we consider given a constraint system CS = (A, T>, V, £), solvers 
Si, . . . , Sn, filters 4>i , . . . , and partial orders . . . , A„. 

We also use the notion of separators that are mainly defined to manipulate 
elements of conjunctions and disjunctions of constraints as elements of lists. A 
A separator (5a is a function S/\ : £ ^ £C s.t.: V C G £, 3n G N, S/\{C) = 
[Cl, ... , C„] where C « Ci A . . . A C„. Similarly, a V separator Sy is a function 
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5y ■. C ^ CC such that: V C G £, 3n € N, 5y{C) = C„] where C « 

Cl V ... vc„. 

Finally, we use the notion of a constraint property p on a constraint system 
{S,'D,V,C) which is a function from constraints to Booleans (i.e., p : C ^ 
Boolean). 

We use five basic operators that are analogous to function compositions and 
that allow to design solvers by combining “basic” functions (non decomposable 
solvers), or to create solver collaborations by combining component solvers. Con- 
sider two solvers Si and Sj. Then, for all C G £: 

— S-O(C) = C {Identity) 

— Si;Sj{C) = Sj{Si{C)) {solver concatenation) 

~ Sf'{C) = S^~^; Si{C) if n > 0 {solver iteration) 

— S*{C) = S”{C) such that S’"''"^(C) = S"{C) {solver fixed-point) 

— {Si, Sj){C) = Si{C) or Sj{C) {solver don’t-care) 



Property 1. Let Si and Sj be two solvers. Then, Sp, Sj, 5”, S*, and {Si, Sj) are 
solvers. 

We also use high level operators: two operators to apply a solver to specific 
components of a constraint, two operators to apply several solvers on a con- 
straint, and two operators to apply a solver on each component of a conjunction 
or disjunction of constraints. Note that in the following, substitutions apply to 
every occurrence of sub-constraints. 

dc{Si,(j)){C): this operator restricts the use of the solver Si to one randomly 
chosen sub-constraint of a syntactical form of C (obtained using the filter fi). 
For all C G £, dc(S'„ fi){C) = C , where: 

- [(C/i,C'i),...,(C/„,C„)] = ((.(C) 

~ if there exists i G [1, . . . ,n] such that SfiCfi) yf Cfi, then C = Ci{Cfi 
Si{Cfi)}, otherwise C = C. 

best(S'j, ^)(C'): this operator restricts the use of the solver Si to the best 
(w.r.t. the partial order fi.) sub-constraint of a syntactical form of C (obtained 
using the filter fi) that Si is able to modify. For all C G C, best(S'i, (j)){C) = C , 
where: 

- [(C/i, Cl), . . . , (C/„, C„)] = 5orter(^, fi{C)) 

- if there exists z G [1, . . . , n], such that SfiCfi) yf Cfi, and Vj G [1, . . . , rz] 
{SfiCffi Cfj ^i<j) then C' = C,{Cfi ^ SfiCfi)}, otherwise C' = C. 

pcc{p,{Si,fiii,4>i), . . . ,{Sn,fiin,fin)){C): this operator applies once one of the 
solvers Si and returns a constraint that verifies the property p. For all C G 
C, pcc(p, [Si,d,i,(pi],..., [Sn, din, 4>n]){C) = C , where: 
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- for alH e . . . , ((7/*,^^ , = S'orter(^j, (/)j(C)) 

— if there exists (i,j) G x such that p{Si{Cfij)), and 

^ C/,,, then C' = ^ Si(Cf,j)}, otherwise C' = C. 



bp((S'i j • • • ) {Sn, din, 4’n)){C)'. this Operator applies n solvers Si, . . . ,Sn 

on n sub-constraints of one syntactical form of the constraint. For all C G 
C, , [S„, dn,(l>n]){C) = C, where'^: 

- foralHG [{C U,i,C"), ... ,{C = Sorter {duUC)) 

- for all i G [1, . . . ,n], if there exists j G [1, . . . ,mi], s.t. Si{Cfi^) yf C/i^, and 

for all k < j, S'i(C/jJ = Cfi^, then ct* = else at = 0. 

- C' = C"a where cr = Ui6[i,....„] 

A_p(S'i, i5a)(C): this operator applies (in parallel) the solver Si to several con- 
juncts (determined by (5^) of the constraint C and the final result is obtained by 
conjunction of the results computed in parallel. For all C G C, A_p(5i, i5a)(C) = 
C , where: 

- [Ci,...,C'„] = ^a(C') 

- C" = ^,(Ci)A...A5,(C„) 



V_p(S'i, (5v)(C): this operator is analogous to A_p but 5y determines disjuncts, 
and the final result is the disjunction of the results computed in parallel. For all 
C G C,\Jjp{S^,5y){C) = C, where: 

- [Cl,...,Cn]=5y{C) 

- c = Si{Ci)y ...y s,{Crd 



In spite of its simplicity, the following property is essential because it allows 
us to manipulate component functions and solvers at the same level, and thus 
to create solver collaboration with the same strategy language. 

Property 2. Consider n solvers S\, . . . , Sn, ri filters (j>i, . . . ,4>n, n partial orders 
di, ■ ■ ■ , dn, a constraint property p, separators i5a and i5v. Then, dc(S'i,(()), 
best(S'i, (j)), pcc{p, {Si,dl,(t>l), • • • . {Sn, dn, (pn)), bp((S'i, ^ 1 , ^l), . . . , {Sn, 
dn,(pn)), A_p(S'j,(5a), and V.p(S'j,(5v) are solvers. 



5 Some Examples of Solver Collaborations 

In this section we exemplify the use of our strategy language specifying solvers 
for constraints over finite domains and real numbers. 

Here we need the list of filters [4>i, ... , pn] to be stable and pairwise disjoint. 



4 




148 Carlos Castro and Eric Monfroy 



5.1 Solving Constraints over Finite Domains 

A CSP P over finite domains is any conjunction of formulae of the form: 

/y {xi € Dxi) A C 

XiGX 

where a domain constraint Xi G D^i is created for each variable Xi occurring in 
the constraint C, D^i being a finite set of values. 

Solving this kind of problem can be seen as an interleaving process between 
local consistency verification and enumeration. The most widely used level of 
consistency verification, Arc-Consistency, can be expressed as the repeated ap- 
plication of the following transformation rule that reduces the set of possible 
values the variables can take. 

Xi € Dxi A c A C Xi G RD{xi G Dxi,c) A c A C if RD{xi G c) yf 

where RD{xi G Dxi,c) = {vi G \ (dui G . . ,Vi-i G Dxi_^,Vi+i G 

7 ■ ■ ■ 5 G Dxn ) ■ c(ui , . . . , Uj , . . . , Vji ) }. 

Then, we define the solver LocalConsistency which applies this rule. In order 
to carry out enumeration, we consider the solver SplitDomain which transforms 
a domain constraint into a disjunction of two domain constraints if the width of 
the original domain is greater than or equal to a “minimal” width e. For finite 
domains, e is generally set to 1. For all c = A G Dx from C: 

~ if c G CDom such that width{c) > e, then 

Split Domain(c) = A G D'x V A G D'x 

where Dx = D'^ U D'^ 

— otherwise. Split Domain(c) = c. 

In order to select domain constraints, we define the filter (j)o that returns all 
domain constraints of the form A G Dx, where Dx specifies the values that the 
variable A can take. 

We also define the filter (^dacaDs that returns sub-constraints which are the 
conjunction of a domain constraint, an atomic constraint, and a conjunction of 
domain constraints, i.e. , an atomic constraint with all the domain constraints 
of the variables occurring in it. 

Finally, we define the sorter :<Dom that returns the candidate whose domain 
constraint is the one with the minimum set of values. 

Then, the solver FullLookaheadMinDom, which returns all solutions to a 
CSP over finite domains, is defined in the following way: 

FullLookaheadMinDom = dc{LocalConsistency,4>DAcADs)*', 

(hest {Split Domain, :<Dom, 4 >d)', 

dc{LocalC onsistency , 4>dacaDs)*)* 

® We generally also enforce that D'x D D'x = 0. 
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This heuristic firstly enforces local consistency. Then, it carries out an enu- 
meration step on the variable with the minimum set of remaining values, followed 
again by local consistency verification. Local consistency verification is always 
carried out on the whole set of constraints. 

Using Svar, a A_separator which splits a set of constraints into n variable- 
disjoint subsets of constraints, the application of FullLookaheadMinDom can be 
improved when solving CSPs that can be decomposed: 

/\-p{FullLookaheadMinDom, dvar) 

In this way, we are solving several CSPs in parallel. The obvious advantage 
is to deal with simpler problems. The solution to the original problem will be in 
the union of the solutions to all subproblems. 

5.2 Optimization Problems over Finite Domains 

Here, we concentrate on an extension of a CSP called Constraint Satisfaction 
Optimization Problem (CSOP). CSOP consists in finding an optimal (i.e., max- 
imal or minimal) value for a given function, such that the set of constraints 
is satisfied [21]. The work of Bockmayr and Kasper [5] seems to be the best 
currently available reference that explains the approach generally used by the 
constraint solving community to deal with this problem. In this section, we first 
explain two approaches for solving CSOPs, and then, we show how they can be 
combined, all of that using our strategy language. 

A CSOP can be described by a tuple {P, f,lb,ub) representing a CSP, an 
optimization function, and the lower and upper bounds of this function. Without 
loss of generality, we consider the case of minimization of a function / over 
integers. To deal with this problem, we consider two approaches, both of them 
requiring an initial step verifying that Sol{C A / <■ ub) yf 0 , i.e., there exists a 
solution to the constraint C satisfying the additional constraint / <’ ub. 

The first approach consists in applying the following rule until it cannot be 
applied any more: 

{P,f,lb,ub) {P,f,lb,a{f)) ii ae Sol{C A f <■ ub) 

Each iteration of this rule tries to decrease the upper bound ub by at least one 
unit until an unsatisfiable problem is obtained. That is why we call this technique 
satisfiability to unsatisfiability. The minimum value of the function / represents 
the upper bound of the last successful application of this rule. Thus, we define 
the solver MinSatToUnsat implementing this approach. We do not detail here 
this definition, but it is obvious that for solving the CSPs, as it is needed by this 
approach, we could use the already defined solver FullLookaheadMinDom- 

The second approach applies the following rules until they cannot be applied 
any more: 

(P, /, lb, ub) ^ (P, /, lb, a{f)) if a G Sol{C A / <^ 

(P,f,lb,ub) ^ (P,f,^-^^^,ub) it Ib^ub and S'o;(C A / <■ ^^^^) = 0 
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The first rule tries to find a new value for the upper bound ub and reduces, 
by at least one-half, the range of possible values of the function / each time a 
new solution is obtained®. The second rule similarly updates the lower bound lb 
in the opposite situation. We call this approach binary splitting and we define 
the solver MinSplitting implementing it. 

Concerning the behavior of these strategies, we can note that the strategy 
MinSatToUnsat takes a lot of time for reaching the minimal value of /, when 
it is located too far from the initial upper bound. On the other hand, applying 
the strategy MinSplitting, the same situation happens when the minimal value 
of / is close to the initial upper bound. Since it is not evident to know where 
the optimal solution is located, an a priori choice between these approaches is 
not possible in the general case. In order to improve the performances of these 
two basic solvers, we could make them collaborate in order to profit from the 
advantages of both of them, and to avoid their drawbacks. 

A first scheme of cooperation between the solvers MinSatToUnsat and 
MinSplitting is expressed by the strategy SeqOpt: 

SeqOpt = {MinSatToUnsat] MinSplitting)* 

Using the strategy SeqOpt both solvers are executed sequentially. Its obvious 
disadvantage is leaving a solver inactive, while the other one is working. More- 
over, due to the exponential complexity of the problem under consideration, the 
whole process could be blocked if one solver cannot find a solution. To avoid 
this situation, we can think of running them concurrently, updating the current 
solution as soon as a new one is available, and stopping the other solver. 

ParOpt={pcc{first, [MinSatToU nsat, None, Id], [MinSplitting, None, Id]))* 

We do not filter the initial set of constraints and so we do not have any sorter. 
In this case, we are interested in the solver that will be the faster, that is why we 
use the first property^. Using this strategy, a solver never waits for a solution 
coming from the other one. In the extreme case that all solutions are read from 
the same elementary solver until the final solution is obtained, the performance 
of this new solver, ParOpt, is the same as if one of the elementary solvers runs 
independently. 



5.3 Combining Symbolic Rewriting and Interval Methods 

Here, we consider systems of non-linear equations, and two solvers. Grobner 
bases computation [6] (i.e., the gb solver) transforms a set of multivariate poly- 
nomial equalities into a normal form from which solutions can be derived more 

® Of course, we can think of different ratios. Thus, the first approach can be seen as a 
particular case of the second one. 

^ Here, since we consider parallel computation, we extend properties of constraints to 
properties of constraints and computations. 
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easily than from the initial set. The second solver, int, is a propagation-based 
numerical solver over the real numbers. We assume that every constraint of the 
CSPs we consider can be processed by int. 

It is generally very efficient to pre-process a CSP with symbolic rewriting 
techniques before applying a propagation-based solver. In fact, the pre-processing 
may add redundant constraints (in order to speed-up propagation), simplify 
constraints, deduce some univariate constraints (whose solutions can easily be 
extracted by propagation), and reduce the variable dependency problem. 

Thus, we consider sc, a simple collaboration where Grobner bases compu- 
tation pre-processes equality constraints before the interval solver is applied on 
the whole CSP: 

sc = dc{gb, 4>=); int 

where the filter <j>= selects equalities of polynomials. 

Consider, for example, the following problem: 

x^ — x*y^ + 2 = t) A x^ — + 2 = t) A y>0 

Most of the solvers based on propagation require splitting to isolate the 
solutions of this CSP. However, using gb (with a lexicographic order x >- y), the 
problem becomes 



— 3 = 0A — l-|-a: = 0Ay>0 
and int can easily isolate solutions. 

However, as stressed in [3] , Grobner bases computation may require too much 
memory and be very time-consuming compared to the speed-up they introduce. 
Thus, in [3] the authors propose a trade-off between pruning and computation 
time: gb is applied on subsets of the initial CSP, and the union of the resulting 
bases and of the constraints that are not rewritten (such as inequalities, and 
equalities of non-polynomial expressions) forms the input of the propagation- 
based solver. We can describe this collaboration as follows: 

A_p(dc(y6, (j>=), Spart); int 

where Spart is the A .separator corresponding to the partitioning of the initial 
system introduced in [3]. 



5.4 The Solvers of CoSAc 

CoSAc [18] is a constraint logic programming system for non-linear polynomial 
equalities and inequalities. The solving mechanism of CoSAc consists of five 
heterogeneous solvers working in a distributed environment, and cooperating 
through a client/server architecture: 

— chrJin [11], implemented with CHRs, for solving linear constraints (equali- 
ties and inequalities). 
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— gb [10] for computing Grobner bases (note that this solver is itself based on 
a client/server architecture), 

— maple-uni for computing roots of a univariate polynomial equality, i.e., 
maple-uni extracts solutions from one equation, not from a set of equations, 

— maple-exp for simplifying and transforming constraints (both this solver and 
the previous one are Maple [12] programs), and 

— eel for testing closed inequalities using ECL*PS® [14] features. 

Since CoSAc uses several solving strategies, these solvers cooperate in three 
collaborations: Sine, Sfm and We now focus on how these collaborations 
could be described in a simple way using our control language. The collabora- 
tions are thus clarified: 1) every constraint cannot be treated by all the solvers, 
and using filters, we can make it clear and formalized; 2) distributed applica- 
tions are implicit and part of the primitive semantics; 3) it becomes clear where 
improvements/strategies can be integrated. 

Sine is the incremental (in the sense of CoSAc) collaboration, i.e., it is applied 
as soon as a new constraint is added to the store, maple-exp transforms (e.g., ex- 
pands polynomials, and simplify arithmetic expressions) all constraints so eqJin 
can propagate information and simplify the set of linear equations (equalities 
and inequalities) filtered by 4>=,<gin- 

Sine = maple-exp ; dc(egJm, (/=,<,;„) 

Sfin is one of the final solvers of CoSAc. It is applied once to the remaining 
constraints. First, constraints are simplified again by {maple-exp) since Sme 
may transform constraints into a syntax gb cannot understand. After computing 
Grobner bases of the set of non-linear polynomial equalities (filtered by (j>=), 
variables are eliminated (by maple-uni) one by one from univariate polynomials 
(filtered by 4>=,uni), solutions are propagated, and linearized equations are solved 
{eq-lin). This process terminates when each variable has been eliminated or when 
there is no more univariate polynomial: 

Sfin = maple-exp ; 
dc{gb,(t)=) ; 

{dc{maple-uni, (/=,„„i); dc(egJm, /)=,<, Hn))* 

Here, we can see the flexibility and the simplicity of our control language. In 
CoSAc, the Sfin collaboration is fixed. From its description in our language, we 
can notice that maple-uni is applied by a don’t care primitive. Some strategies 
can easily be introduced to improve the collaboration. In fact, maple-uni could 
be applied with a “best” primitive, ordering possible candidates with respect to 
the increasing degree of univariate polynomial equations (with a :<degree sorter). 
Using hest{maple-uni, :<degree, 4>=,uni) variables could be eliminated from lower 
degree equations first, and thus less arithmetic errors/roundings could be prop- 
agated to the system (and that is a weak point of CoSAc). Goncerning gb and 
eq-lin, a “best” primitive would not help since these solvers consider the “max- 
imal” set of filtered constraints. 
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^'fin alternative to S fin which is more efficient when eliminations of non- 

linear variables do not linearize any other constraint and only ground inequalities 
have to be checked by eel. We can write it as: 

^'fin = rnaple.exp ; 

dc{gb,(t)=) ; 

{dc{maplejwni,4>=^um))* ; 

(dc(ec/, (j^<i^ground}^ 

Again, strategies can be introduced since ground inequalities can be checked 
simultaneously. Using done, a A .separator that splits a set of n constraints into 
n singletons of atomic constraints, the application of eel is improved: 



A_p(dc(eC^, (j^^^ground) : ^one^ 



We remark that we still need a filter for eel since done does not perform any 
filtering. 

As mentioned in [17], the first solvers of Sfin and can be “factorized”: 

'S'/m = maple-exp ; 
dc{gb,(l)=) ; 

pcc(/zrst, [{dc{maple.uni, 4>=^uni)', dc(eqJin, , None, Id], 

[{dc{maple-uni, 4)=, uni))* ; (dc(ecZ, 4><, ground))* , None, Id]) 

The remaining parts of the collaborations are executed concurrently. No fil- 
tering is needed {Id for both sub-collaborations), and thus we do not have any 
sorter {None) since there is only one candidate after filtering, i.e., the initial 
set of constraints. We do not impose any property on the result, and we are 
interested in the sub-collaboration that will be the faster {first property). Note 
that improvements for applying eel and maple.uni still hold in 



5.5 Combining Consistencies 

Box consistency [2] is a local consistency notion for interval constraints that re- 
lies on bounds of domains of variables: it is generally implemented as a (local) 
splitting of domains combined with the interval Newton method for determining 
consistent bounds of intervals. Hull consistency is another notion of consistency, 
stronger than box consistency. However, it can only be applied on primitive con- 
straints that are either part of the original CSP, or are obtained by decomposing 
the constraints of the CSP. Then, the reduction of the “decomposed” CSP is 
weaker, but also faster. The idea of [2] is to combine these two consistencies in 
order to reduce the computation time for enforcing box consistency. 

Let us consider Hull and Box, two solvers that respectively enforce hull and 
box consistency of a CSP. Then, the combination of [2] can be described by: 



{Hull ; Box) 
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Since we can define both solvers and collaboration in our language, we now 
specify the Hull and Box solvers: 

Box = {dc{hox,(f)^p)y and Hull = {dc{hull,(j)p))* 

where 4>p (respectively 4>^p) filters one primitive (respectively non-primitive) 
constraint together with the domain constraints (e.g., x € [a, b]) associated with 
each of its variables, box (respectively hull) is a component solver that given a 
constraint c enforces box (respectively hull) consistency of c w.r.t. each of its 
variables. 

We can also consider some inner strategies, such as reducing the variable 
with the largest domain. Then, Hull and Box are defined as follows: 

Box = (best {box, ^,(j)^p))* and Hull = (best {hull, (pp))* 

where “1^” selects the candidate with the largest domain. 

Note that we could once again decompose these solvers into solvers that en- 
force box (or hull) consistency of one constraint with respect to one variable. 
Note also that {Hull ; Box)* can represent the solver int considered in Sec- 
tion 5.3. We could also think about some other description of Hull and Box 
(e.g., using parallel application of solvers), but then we would not respect any- 
more the original combination of [2] . 



6 Conclusions 

We have presented a strategy language for solving CSPs via collaboration of 
solvers. A key point in this work is the introduction of basic strategy operators 
that allow the design of solvers by combining basic functions as well as the col- 
laboration of solvers by combining component solvers. We have exemplified the 
use of this language by the simulation of well-known techniques for solving CSPs 
over finite domains and non-linear constraints over real domains. To show the 
broad scope of our control language’s potential applications, we have designed 
several solvers that are considered of different nature (such as propagation based 
solvers, optimization over finite domain, and Grobner bases computation). We 
are currently working on the implementation of this language in order to evaluate 
the real applicability of this framework. From a more theoretical point of view, 
we are considering as further work the verification of the termination properties 
of the strategy operators. 
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Abstract. A method for determining loci without using a deep alge- 
braic background is presented. It uses pseudodivision techniques (Wu’s 
algorithm). The key idea is to make the hypothesis conditions depend 
on an indeterminate point, X. When forcing the thesis condition to be a 
consequence of hypothesis conditions, a new condition involving X ap- 
pears. That condition leads to the locus. The method is applied to prove 
a new theorem: the generalization of Simson-Steiner Theorem to 3D. 

Keywords. Automatic theorem proving. Geometric loci. Pseudodivisions. 



1 Introduction 

Applying Wu’s techniques to discover geometric theorems was already suggested 
in [2,10]. The key idea is to add new additional hypotheses to a set of original 
hypotheses (from which the thesis can’t be deduced) in order for the thesis to 
become a consequence of the extended set of hypotheses. Different authors have 
treated the problem from different points of view. 

Kapur and Mundy [5] apply this to perspective viewing. In the first step they 
use the Ritt-Wu characteristic method to obtain the characteristic set of the 
hypotheses’ ideal. The conclusion is pseudo-divided by a polynomial ideal. If the 
pseudo-remainder is 0, the thesis is a consequence of the hypotheses. Otherwise 
it is factored and some of its factors are possible candidates to become additional 
hypotheses. This approach is summarised, together with other different methods, 
in the excellent paper [4]. 

T. Recio and M. P. Velez developed [7] a method for automatic discovery of 
theorems based on Grobner-base computation. It makes use of Hilbert’s Null- 
stellensatz and clearly details the mathematics in the background (ideal/variety 
duality). It consists of finding complementary hypotheses until a statement that 
becomes true is obtained. 

A specific use related to triangles and measure also based on Grobner-base 
computation can be found in [6]. 

Reading these ingenious articles suggested to us the idea to try to specifically 
determine geometric loci automatically. In our first article in this line [9] we 

* Partially supported by project DGES PB96-0098-C04-03 (Spain). 
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reproved a theorem about geometric loci recently discovered [3] . This method is 
generalized here. 

Wu’s algorithm is used for operating on polynomials, but without explicitly 
using the ideal/ variety duality. Therefore the method can be understood and 
justified without a deep algebraic background (in fact only pseudodivisions and 
linear algebraic combinations of polynomials are used). Moreover, as Hilbert’s 
Nullstellensatz is not required at any step, the base fields do not need to be 
algebraically closed. Let us observe that the method uses sufficient conditions 
at different steps, so if it is not able to provide any possible additional hypothe- 
ses, that doesn’t mean that the result can’t be reached with the help of other 
methods. 

This paper begins describing an adaptation of Wu’s method for mechanical 
geometry theorem proving (not Wu’s complete method -in Chou’s terminology 
[2]) to our goal. The process for determining loci is described in detail afterwards. 
Finally, it is used to determine a locus, which can be considered a generalization 
to 3D of Simson-Steiner’s theorem for 2D reproved in [9]. 

2 Basic Algebraic Tools. Adaptation of Wu’s Algorithm 

In this section, we briefly summarize concepts relative to the basic algebraic 
ideas used in Wu’s algorithm and adapt them to introduce the algebraic tools 
used in this paper to determine geometric loci automatically. Introductory books 
for those concepts are [1,2,10]. 

Let K[v,w, z] be a polynomial ring in the indeterminates v,w,...,z over 
the field K of characteristic 0. In the usual polynomial division of polynomials 
belonging to this ring, the quotient and the remainder obtained are rational 
expressions, which can be non-integer expressions (i.e. variables can appear in 
denominators). To avoid this inconvenience, division can be substituted by pseu- 
dodivision. 

Given the polynomials f,g G K[v,w, z], the pseudodivision of / by g 
with respect to the variable v consists of the usual polynomial division, after 
substituting / by its product by the multiplier 

where Icoef f{g,v) and deg{g,v) are the leading coefficient and the degree of 
g with respect to the variable v, respectively. The quotient and remainder ob- 
tained this way are called pseudoquotient and pseudoremainder. This pseudore- 
mainder will be briefly denoted prem{f,g,v) and the corresponding multiplier 
mulf{f,g,v). It can be proved that the pseudoremainder and pseudoquotient 
are integer expressions (variables do not appear in their denominators). As in 
usual polynomial division, the pseudoremainder (r) and the pseudoquotient (q) 
verify m • f = g ■ q + r and hence r is in the ideal </,(/> of the polynomial ring 

r G< f,g > K[v,w,...,z] (1) 

Besides, deg{r,v) < deg{g,v). So, if deg{g,v) = 1, then v does not appear in r. 
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The pseudodivision operation can be applied to reduce a finite family of 
multivariate polynomials to triangular form. Starting with a list of s indeter- 
minates or variables [ui, U 2 ) fs] over the field K, and a list of s polynomi- 
als, [hi, Ii2, ■■■, hs], belonging to the ring K[vi,V2, ■■■,Vs\, a process similar to 
Gaussian-elimination, but substituting linear operations by pseudodivisions, can 
be applied until a triangular system of polynomials 

9i = 9i{vi,V2,vz, ..., Vs ),92 = 92 { v 2 , V 3 , ..., Vs ),93 = gsivs, -,Vs), -,9s = 9s{vs) 

is obtained, by applying a constructive algorithm (described for example in [1] 
and [2]), which will be denoted hereafter by triam, 

trian.{[hi,h 2 , -,hs],[vi,V 2 , -,Vs]) = [gi,g2,-,9s] (2) 

These polynomials gi satisfy two conditions that will be essential later: for each 
i = 1, ..., s, deg{gi, Vi) > 0 and, as a consequence of (1), 

91,92, —,9s G< hi,h 2 ,—,hs > K[vi,V2, —,Vs] (3) 

Now, starting from the lists [ui, ..., Ug] and [gi, ..., gs] mentioned above and a new 
polynomial, th G K[vi, ...,Vs], the following sequence of pseudoremainders can 
be considered 



ri = prem{th,gi,vi), r2 = prem{ri, g2,V2), ,rs = prem{rs-i, gs,Vs) (4) 

The last pseudoremainder obtained this way, r^, will be called final pseudore- 
mainder and the process to compute it will be denoted fimprem 

fin-premfth, [91,92, -,9s], [vi,V2, ...,Us]) = r* 

Now, as a consequence of (2), can be directly obtained from [hi, h2, hs] and 
the process to compute it directly will be denoted finaLprem 

r.s = final 4>rem{th, [hi,...,hs], [ui,...,Us]) = 

= fin.prem{th, trian.{[hi , ..., hs], [ui, ..., Vs]), [ui, ««]) 

The sequence of multipliers used in the pseudodivisions (4) 

mi = mulf{th, 91, vi), m2 = mulf{ri,g2, V2), , = mulf{rs-i,9s, Us)] 

can be joined in a list and the process to compute it will be denoted mJist 

mJist{th, [91,92, —,9s], [vi,V2, —,Vs]) = [mi, m2, ,ms] 

In the same way, as a consequence of (2), this list can be directly obtained from 
[hi, ft- 2 ) hs] and the process to compute it directly will be denoted mulfJist 

[mi, m2, , ms] = mulfJistfth, [hi, ..., hs], [ui, ..., Vs]) = 

= mJist{th,trian.{[hi, ...,hs], [ui, —,Vs]), [ui, ...,Us]) 




160 Eugenio Roanes-Macias and Eugenio Roanes-Lozano 



In the classic Wu’s method, if Vg = 0, then the relation th = 0 (considered 
as the thesis condition) follows from the set of relations h\ = 0,...,hs = 0 
(considered as hypothesis conditions) and from mulf{ri-i,gi,Vi) yf 0 the non- 
degenerate conditions are obtained. 

In the method of determination of geometric loci that will be presented here- 
after the same algebraic tools used in Wu’s method will be used, but they are 
used in a different way in order to reach a different target. The following lemma 
will be essential in this process. 

Lemma 1. In accordance with the notation mentioned above, the final pseu- 
doremainder can he expressed in the form: 

S 

rs = ms-ms-i- ...■mi-th-\-'^jihi ; e K[vi,V2, -^Vs] ( 5 ) 

i=l 

Proof. The s pseudoremainders (4) can be written in the form 

n = mith - giqi, T2 = m2ri - 3292, •••, = rnsUg^i - g^ps 

where gi, < 72 ) •••, 9s and toi, m 2 , ..., m^ are the pseudoquotients and the multipli- 
ers, respectively. Substituting the value of each one of these pseudoremainders 
in the following equality, can be expressed in the form 

rs = ms- ms-i ■ ... ■ mi ■ th + '^ldj9j Pj G K[vi,V2, ...,u„] 

i=i 

As, in accordance with (3), the gj can be expressed in the form 

9j = ; Sji G K[vi,V2,—,Vn] 

2=1 



consequently 



Vs = ms- ms-i ■ ... -mi-th + '^ 

j=i 1=1 

Now, the lemma equality follows immediately by denoting ~ 7*- 

Overview of the algorithms used: In accordance with the preceding explanations, 
the polynomials and list of polynomials involved in the calculations are: 



V = [vi,V 2 , ...,Us] 
H = [hi,h2,.:,hs] 
G= [9i,92,-,Ss] 

th 

Ts 

[mi, m 2 , ,ms] 



(list of variables or indeterminates over K) 
(list of polynomials in K[v\,V 2 , ■■■, Us]) 

(list of triangulated polynomials) 
(polynomial in K[v\,V 2 , ■■■,Vs\) 

(final pseudoremainder) 

(list of multipliers) 
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and the algorithms used are the following: 
triari-{H, V) = G 
fin4>rem{th,G, V) = Ts 

f inal 4>rem{th, H, V) = fin4>rem{th, triari-{H, V), V) = 
mJist{th,G,V) = [nil, 1712 , ,TOs] 

mul f 2 ist{th, H, V) = mJist{th, trian.{H, V),V) = [mi, m2, , ms] 

As the calculations mentioned here are very laborious, they must be auto- 
mated. This can be implemented on a CAS containing a command that calculates 
pseudodivisions. We have developed an implementation of these algorithms in 
Maple, using techniques that are described in [8]. The code is omitted for the 
sake of brevity, but anyone interested in obtaining it is welcome (it occupies 
about 3K in its readable form). 



3 A Previous Example to Illustrate the Method 

In order to illustrate the method, we shall begin by showing the ideas over an 
easy well known example of locus, the Simson- Wallace Theorem generalized by 
Jakob Steiner. We shall state it as a problem and reprove this theorem using our 
own method. 

Problem: Let X he a point in the plane of triangle ABG and let M, N, P be the 
orthogonal projections of X in the side-lines AB, BC,GA, respectively. Let us 
move X in the plane of ABC in such a way that the area of triangle MNP is 
kept unchanged (as a constant, a). What is the locus of points X? (Fig. 1). 

In the preceding problem one can distinguish three types of points. Points 
A, B, C are freely chosen in the plane (except for exceptional positions in which 
they are collinear) and they are consequently called free points. The indeter- 
minate point X, which gives the geometric locus, is called locus point. Finally, 
points M, N, P, determined from locus and free points by geometric conditions, 
are called linked points or dependent points. 

In order to clarify the description of the method, it is convenient to distin- 
guish several steps in its execution. 

STEP 1. Select the coordinates 

For the sake of simplicity of calculations, it is convenient to select a coordinate 
system such that most of the free points have coordinates as simple as possible. 
Free points: A(0, 0), B{b, 0 ),C{c, e). 

Locus point: X{x,y). 

Linked or dependent points: M{m,Q),N{n,u),P{p,q) 
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STEP 2. Convert hypothesis and thesis conditions into polynomial equations 

Conditions that determine linked points (starting from free points and locus 
point) are called hypothesis conditions and conditions that determine the locus 
point are called the thesis conditions. 



Hypothesis conditions: 
m — X = 0 

{n — b) • e — u • {c — b) = Q 
{n — x) • {c — b) + e ■ {u — y) = Q 
{p-x)-c+e-{q-y) = Q 
p ■ e — q • c = 0 



{XM _L AB) 
{N e BC) 
{XN _L BC) 
{XP _L CA) 
{XN _L BC) 



Thesis condition: 




= 2a 



(area(FTH) = a) 




Fig. 1. The locus of point X 



STEP 3. Establish parameters and coordinates 

Free coordinates: b, c, e (coordinates of free points) 

Locus coordinates: x, y (coordinates of locus point) 

Linked or dependent coordinates: m,n,u,p,q (coordinates of linked points) 
Parameters: b, c, e, a (freely chosen variables, including free coordinates) 
Parameter conditions: 5 yf 0 yf e {A,B,C are non-collinear points) 

STEP 4. Input the hypotheses and thesis polynomials 

Starting from the left hand side of hypotheses and thesis equalities of STEP 2 by 
substituting dependent coordinates by independent variables over M, we obtain 
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the hypothesis polynomials (denoted hi, h2, h3, M, h5) and the thesis polynomial 
(denoted th). Let us write the Maple code: 

> hi := m - x: 

> h2 := (n - b) *e - u* (c - b) : 

> h3 := (n - x)*(c - b) + e*(u - y) : 

> h4 := p*e - q*c: 

> h5 := (p - x) *c + e* (q - y) : 

> th: = det (matrixC [ [f ,g, 1] j [r, s , 1] j [t ,z, 1] ] ) ) “ 2 *a: 

STEP 5. Compute the final pseudoremainder and factorize it 

Starting from the lists of ordered variables and hypothesis polynomials, which are 

denoted V and H , respectively, the final pseudoremainder (r5) can be computed: 

> V := [q,p,u,n,m] : 

> H: = [h5,h4,h3,h2,hl] : 

> r5:=final_prem(th,H,V) : 

After factoring, collecting a and sorting it, r5 can be written: 

r5 = 2 • • (2 • 6 • — 2 • 6^ • • a: + 2 • 5 • + 2 • 6 • • y • (5 • c — — e^) 

— a • (6^ — 2 • 5 • c + + e^) • (c^ + e^)) 

Let us denote by rho the product of factors not depending on locus coordinates 
{x, y) and let us denote by phi the only factor depending on them: 

> rho:=2*e"2: 

> phi : =simplify (r5/rho) ; 

(j) = 2 ■ b ■ e^ ■ — 2 ■ ■ e^ ■ X + 2 ■ b ■ e^ ■ y"^ + 2 ■ b ■ ■ y ■ {b ■ c — — e^) 

—a • (6^ — 2 • 6 • c + + e^) • (c^ + e^) 

In accordance with the lemma, can be expressed in the form: 

^5 = m^m4,m3m2mith + wihx + W2h2 + + 

where wi,W 2 , ,W 5 are polynomials in the variables. Now, substituting vari- 

ables by dependent coordinates, we have hi = 0 A ... A h^ = 0. Hence, for the 
thesis condition, th = 0, to be verified, must be zero. As 2 • 7 ^ 0 (under 

parameter condition e 0), for the thesis condition be verified, we must have 
^ = 0. This is the equation (with respect to x, y) of a sheaf of concentric circles 
whose radio depends on area a. 

We have seen that points X such that area(FRT) = a verify 0 = 0. But the 
reciprocal question arises: does every point in 4> = 0 verify the thesis condition? 
To answer it, 0 = 0 must be input as a new hypothesis condition. 

STEP 6. Input the new hypothesis polynomial 

Consequently, the left hand side of 0 = 0 must be input as new hypothesis 
polynomial, which will be denoted he'. 

> h6 : = phi : 
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STEP 7. Establish new parameters and coordinates 

As point X{x,y) must verify ^ = 0, one of its coordinates (abscissa x, for ex- 
ample) can be freely chosen, and so it will be a new parameter, and the other, 
a new dependent coordinate. 

New parameters', b, c, e, a, x. 

New variables', m, n, u,p, q, y. 

STEP 8. Compute the new final pseudoremainder 

Starting from the new lists of variables and hypothesis polynomials, which are 
denoted VV and HH , respectively, the final pseudoremainder (r6) can be com- 
puted by applying the operator finaLprem of section 2. 

> VV := [q,p,u,n,m,y] : 

> HH: = [h5,h4,h3,h2,hl,h6] : 

> r6 : =f inal_prem (th , HH , VV) : 



r6 = 0 



STEP 9. Compute the value of the multipliers 

The list of multipliers, denoted by M, can be computed by applying the operator 
mulfJist of section 2. 

> M := mulf_list(th,HH,VV) ; 

M = [e, -I- c^, e, -I- — 2cb + 6^, 1, e^b] 

Therefore, each multiplier is nonzero, under parameter conditions (6 yf 0 yf e). 

In accordance with the lemma, re can be expressed in the form: 

re = meme m 2 mith + ^ihi + 72/12 + + Jehe 

where 71 , 72 , , 7 e are polynomials in the variables. As re = 0, substituting the 

variables by the dependent coordinates {hi = 0 A ... A he = 0), we have 

0 = meme rri2mith 

Finally, as multipliers are all non-zero, the thesis condition th = 0 follows. 

Summary: the locus of X such that the area of the triangle MNP is the constant 
a, is the circle of equation (j> = 0. It can be easily verified that this circle is 
centered in the circumcenter of triangle ABC. In particular, for a = 0 (i.e., 
M, N, P being collinear) the circle passes through point A and therefore it is the 
circumcircle of ABC (Sims on- Steiner Theorem). 
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4 Loci: General Method of Determination 

The concepts introduced through the preceding example will be generalized here. 
In order to state the ideas precisely, we shall begin by defining the concept of 
geometric loci. 

Definition 1. Let us consider the euclidean space A'", over a field K, not nec- 
essarily algebraically closed, containing a base field, k, of characteristic 0. Let 
us consider, in this euclidean space: 

• a subset of points, called free points, freely chosen, except for exceptional 
situations (such as non-collinear A,B,C in the preceding section) 

• an indeterminate point, X , called locus point 

• a subset of points, called linked points or dependent points, determined from 
the locus point and some free points by geometric conditions, called hypoth- 
esis conditions 

• another condition, called the thesis condition, involving some of the linked, 
locus and free points and any parameter cq (a in the example of section 3) 

Then, the subset of points X in AT” that satisfy the thesis condition, under 
hypotheses conditions and parameters conditions, is called the geometric locus 
or, briefly, the locus. 



Note 1. It will be supposed that all the conditions mentioned can be expressed 
by polynomial equations (otherwise the method described hereafter is not oper- 
ative). 

A computer algebraic method to determine geometric loci, based on pseudo- 
divisions, is described hereafter. The key idea is to make the hypothesis condi- 
tions depend on an indeterminate point, X (locus point). When trying to obtain 
the thesis condition from the hypothesis conditions, a new condition involving 
X appears. And this condition leads to the determination of the locus. In order 
to clarify the description of the method, it is convenient to distinguish several 
steps in its execution. 

STEP 1. Select the coordinate system 

Coordinates of the free points (concatenated coordinates of all the considered 
free points) will be denoted oi, 02 , ..., a^. Coordinates of the locus point, X, will 
be {x\, ...,Xn)- Coordinates of all the linked points (concatenated coordinates 
of all the considered linked points) will be denoted di,d2, ■■■,ds- For the sake of 
simplicity of calculations, it is convenient to select a coordinate system such that 
most of the free points have coordinates as simple as possible. 
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STEP 2. Convert hypothesis and theses conditions into polynomial equations 

Geometric conditions that determine linked points (starting from free points and 
locus point), which we have called hypothesis conditions, are to be expressed by 
polynomial equations among coordinates: 

h^{al,a 2 ,■.■,ar;xl,...,x„;dl,d 2 ,■.■,ds) =0 ; z=l,2,...,s (6) 

and the same has to be done with the geometric condition to be satisfied, which 
we have called the thesis condition: 

thico,ai,a2,...,ar;xi,...,Xn;di,d2,...,ds) = 0 (7) 

STEP 3. Establish parameters and coordinates 

Coordinates of free points (ai, U 2 , flr) are called free coordinates. Coordinates 
of the locus points {x\, are called locus coordinates. Coordinates of linked 

or dependent points (di, ...,Xd) are called linked coordinates or dependent coordi- 
nates. Variables that can be freely chosen (including free coordinates and cq) are 
called parameters. Conditions of free coordinates that consider the exceptional 
positions of free points that must be excluded are called parameter conditions. 
They are usually inequality conditions of the form Oj yf 0 or < Oj. 

STEP 4. Input hypothesis and thesis polynomials 

From the left hand side of each equations in (6) and (7), we define a polynomial 
by substituting linked coordinates, d\,...,ds, for independent variables over K, 
denoted v\, ...,Us, respectively. These polynomials will be denoted, respectively 

h^{al,...,ar■,Xl,...,Xn■,Vl,V 2 ,...,Vs) ; z=l,2,...,s (8) 

t/i(co, ai, ..., Or, Xi, ..., Xn, Vi,V 2 , ..., Vs) (9) 

where cq, ai, 02 , flr G K are considered as parameters and vi,V 2 , ...,Vn as in- 
dependent indeterminates over the field K. Polynomials (8) and (9) will be called 
hypothesis polynomials and thesis polynomial, respectively. 

As a consequence, substituting every variable in (8) and (9) by its correspon- 
ding linked coordinate, expressions (6) and (7) are obtained, respectively. 

For the sake of simplicity of calculations, it is convenient to express (8) and (9) 
as polynomials with integer coefficients (in 'Zi[cQ,ai, ...,ar,x\, ...,Xn][v\, ...,Vs\). 

STEP 5. Compute the final pseudoremainder and factorize it 
Starting from the lists of ordered variables and hypothesis polynomials, which are 
denoted V and H, respectively, the final pseudoremainder (r^) can be computed 
by applying the operator finaLprem of section 2. (An appropriate selection of 
the order of variables and hypothesis polynomials can shorten its calculation) . 
After factoring this final pseudoremainder polynomial, r^, it can be written 

r.s = p\\_(l>j{co,ai, ...,Or,Xi, ...,Xn,Vi, ...,Vs) 

3 



( 10 ) 
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where p is the product of the factors not containing locus coordinates and the 
(j)j are the factors containing them. 

Therefore, the 4>j are the candidates to define equations of component of the 
geometric locus and we shall refer to them as locus polynomial factors. 

STEP 6. Add a new hypothesis polynomial 

From now on, it will be supposed that point X is in the algebraic variety of AT” 
defined by 4>j. Hence, the new condition ..., a^, xi, ..., c?i, ..., ds) = 0 is 

to be added to the list of the s hypotheses conditions considered in STEP 2. As 
a consequence, the locus polynomial factor ..., a^, xi, ui, ..., Vs) is to 

be added to the list of the s hypotheses polynomials considered in STEP 3, to 
form the new list of s + 1 hypotheses polynomials [hi,h 2 , ■■■,hs,4>j]. 

Consequently, one of the loci coordinates xi,...,Xn can no longer be freely 
chosen and therefore it must be considered as a new linked coordinate. For in- 
stance, if degree{4>j,Xn) = 1, then Xn can be selected as new linked coordinate 
to be added to the old ones (such degree is to be chosen positive, but as small 
as possible). Therefore, the variable Xn is to be added to the list of s variables 
considered in STEP 3, to form the new list of s -I- 1 variables [v\,V 2 , Vs,Xn]- 

STEP 8. Compute the new final pseudoremainder 

Starting from the new variables list, VV = [ui, ..., Ug, a;„], the new hypothesis 
polynomial list, HH = [hi,h 2 , ■■■,hs,4>j] and the thesis polynomial th, the new 
final pseudoremainder can be computed by applying the operator finaLprem: 

x's+i = final.premfth, H H,VV) 

In order for the process to succeed, must be zero. 

STEP 9. Compute the value of the multipliers 

Finally, the list of multipliers, denoted by M, can be computed by applying 
operator mulfJist of section 2. 

mulfJist{th,HH,VV) = [m'l, 

In order the process to succeed, it is expected that all these multipliers must be 
nonzero (after substituting variables, fi,...,fs, by linked coordinates, di,...,ds, 
and under parameters conditions considered in STEP 3), as is established by the 
following theorem. 

Note 2. Steps 6 to 9 must be repeated for each one of the locus polynomial 
factors obtained in STEP 5. 

Theorem 1. In accordance with the notation above, if p ^ 0 (under parameter 

conditions) and for j = 1, 2, , the locus polynomial factor (j>j satisfies the two 

following conditions: 

1) = 0 (the final pseudoremainder is zero) 

2) m[ yf 0 ,...,to(+i ^ 0 ( under parameter conditions ) 
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then the geometric locus that satisfies the thesis condition (7), under hypothesis 
conditions (6) and parameters conditions, is the union of the subvarieties in RT” 
of equations 

(cq j (Jt- , Xi, Xji , di, ds ) 0, j 

Proof. Let us suppose that (xi,...,Xn) is a point of the locus defined by con- 
ditions (6), that verifies (7). Let us prove that this point is the union of the 
mentioned subvarieties. 

As the final remainder, obtained in STEP 6, verifies lemma 1 and it can 
be factored as in (10), it follows that 

e s 

pWfi = nis ■ rris-i • • mi • ^ jijhj 

i=i i=i 

As p has been supposed nonzero, substituting every indeterminate, Vi by its 
corresponding linked coordinate, di, from (6) and (7) it follows that 

e 

Y\_fj{ai,a2,—,ar,xi,...,Xn,di,d2,...,ds) = 0 
i=i 

Hence, one, at least, of these z factors must be null. If, for instance, this occurs 
for j = u, then {x\, ...,x„) belongs to the subvariety defined by fu- 

Reciprocally, suppose now that (xi,...,x„) is a point of the union of the 
mentioned subvarieties and let us prove that this point satisfies (7). If this point 
is in the subvariety defined by = 0, then 

(^1 J • J J X\, .. ., Xji , d\, . .. , ds ) — 0. (H) 

As a consequence, this equation can be added to the hypothesis conditions 
and hence ..., a^, xi, ..., ui, ..., Ug) can be added as a new hypothesis 

polynomial. Now, by applying lemma 1 to the new system of hypothesis poly- 
nomials, 

S 

r'+i = m(,+i • m), • m(,_i • ... ■m\-th + '^ llhi + 

As has been supposed zero (condition 1 of hypothesis), it follows that 

s 

m(,+i • m), • m's_i ■ ... ■m[-th = -^ ’^■hi - 

i=l 

and substituting variables vi, ..., Vs by dependent coordinates d\, ..., ds, from (6) 
and (11), it follows that 

m(._,_i • m), • m's_i ■ ... • m'l • th{ai, ...,ar,xi, ...,x„,di, ...,ds) = 0 

As these multipliers have been supposed to be nonzero (condition 2 of hypothe- 
sis), condition (7) is therefore satisfied. 
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5 Extension of the Simson-Steiner Theorem to 3-D 



As an application of the method of determination of loci explained in the preced- 
ing section, we shall try to extend Simson-Steiner Theorem considered in section 
3 to 3D. This, as far as we know, is a new theorem. As in section 3, we shall 
state it as a problem. 

Problem: Let us consider in the real euclidean space a tetrahedron (OABC ), 

an arbitrary point (X) and the orthogonal projections (M, N,P,Q) of X on face- 
planes of OABC. What is the locus of X, such that vol{MN PQ) = v? 

Let us adapt the problem to apply the method described in the preceding 
section. Vertices (O, A, B, C) of the tetrahedron are the free points and X is the 
locus point. The orthogonal projections (M, N, P, Q) are the linked or dependent 
points and the geometric conditions that determine these points as projections 
of X give the hypothesis conditions. Finally, the condition that the volume of 
tetrahedron OABC is kept unchanged (equal to the constant v) is the thesis 
condition. As in the preceding section, we shall distinguish nine steps. 



STEP 1. Select the coordinates 

Vertices of the tetrahedron: 0(0, 0,0), A(1,0,0), .6(0,5, 0), 0(1, c,e). 

Locus point: X{x,y,z). 

Projections of X: M {ml, m2, m3), N{nl,n2,n3), P{pl,p2,p3), Q{ql, q2, q3) 



STEP 2. Convert hypothesis and thesis conditions into polynomial equations 



Hypothesis conditions: 
[OM,^A,^\ = 0 
2UH ■ OA= 0 
XM 

[OfV, dB,dc] = Q 
2^ ■ (W= 0 

[^, dc,OA] = 0 
^ ■ (^= 0 

[AQ,AB,AC] = 0 
XQ ■ AB = 0 
XQ ■ AC = 0 



0,A,B,M coplanar (triple product = 0) 
XM 1. OA (inner product = 0) 

XM T OB 
0,B,C,N coplanar 
XN T OB 
XN T OC 
0,C, A, P coplanar 
XP LOC 
XP T OA 
A, B, C, Q coplanar 
XQ T AB 
XQ T AC 



Thesis condition: 

[ATV, ATP, - 6u = 0 



vol{MNPQ) = V 
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STEP 3. Establish parameter and coordinates 

Free coordinates: b, c, e 
Locus coordinates: x, y, z 

Linked or dependent coordinates: ml,m2,m3,nl,n2,n3,pl,p2,p3,ql,q2,q3 
Parameters (including free coordinates): b,c,e,v 
Parameter conditions: b ^ 0 ^ e 

As 0,A,B,C must be non-coplanar points, parameter conditions follow from 
vol{OABC) = = \be^Q. 

STEP 4. Input the hypothesis and thesis polynomials 
To make it easier, three procedures have been implemented: 
escl(R,S,T,U) for inner product RS ■ TU 

vect(R,S,T,U) for cross product RS x TU 

tripl(R,S, T,U) for triple product [RS, RT, RU] 
where R, S, T, U are points in Thus, we have (in Maple): 

> hi := tripl(0,M,A,B) : 

> h2 := escl(X,M,0,A) : 

> h3 := escl(X,M,0,B) : 

> h4 := tripl(D,N,B,C) : 

> h5 := escl(X,N,0,B) : 

> h6 := escl(X,N,0,C) : 

> h7 := tripl(0,P,C,A) : 

> h8 := escl(X,P,0,C) : 

> h9 := escl(X,P,0,A) : 

> hlO ;= tripl(A,Q,B,C) : 

> hll := escl(X,Q,A,B) : 

> hl2 ;= escl(X,Q,A,C) : 

> th: = tripl(M,N,P,Q) - 6*v; 

STEP 5. Compute the hnal pseudoremainder and factorize it 

Lists of variables and hypothesis polynomials are denoted V and H, respectively: 

> V := [ml,m2,m3,nl,n2,n3,pl,p2,p3,ql,q2,q3] : 

> H := [h2,h3,hl,h6,h5,h4,h9,h7,h8,hll,hl0,hl2] : 

> rl2;=final_prem(th,G,V) : 

> factor (rl2) ; 

55g2(^2 - 6e®u - e^bz^ - Ue^c^v - Qc^v - e^bc^x^z 

— e^b'^x'^y — e^bx'^z — — e^b'^z^y + e%<?zx 

—e%'^zcx + 2e^bz^cy + 2e%zcxy + e^bzx — e^bc^z'^ + e^hxz"^ 

—e^by'^x — e^by'^z + e%'^z^c — e%z^c^ — Gc'^e^v + e^b'^xy + e^b'^zy 
—e^b^z^c — 66^e®u — 6e^v — 12c^e^v — Gb^e'^c^v) 

Denote by p the product of factors not depending on x, y, z and denote by <j) the 
only factor depending on x,y,z: 
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For the thesis condition, th = 0, to be verified, r \2 must be zero. Note that 
p = b^e^{b^ + 1) 7 ^ 0, under parameter conditions. Hence, for the thesis con- 
dition, th = 0, to be verified, (j) must be zero. But, does every point in (/) = 0 
verify the thesis condition? To answer it, (j) must be added as a new hypothesis 
condition. 

We have seen that points X such that vol{MNPQ) = v verify (j> = 0. But the 
reciprocal question arises: does every point in 4> = 0 verify the thesis condition? 
To answer it, 4> = Q must be added as a new hypothesis condition. 

STEP 6. Input the new hypothesis polynomial 

Consequently, (j) must be added as new hypothesis polynomial, which will be 
denoted hl3: 

> hl3 ;= phi: 

STEP 7. Precise new parameters and coordinates 

As point X{x,y,z) must verify h\z = 0, one of its coordinates (z, for example) 
can be freely chosen, and so it will be a new parameter, and the other ones will 
be new dependent coordinates. 

STEP 8. Compute the new final pseudoremainder 

New lists of variables and hypothesis polynomials are denoted VV and HH: 

> VV := [ml,m2,m3,nl,n2,n3,pl,p2,p3,ql,q2,q3,z] : 

> HH := [h2,h3,hl,h6,h5,h4,h9,h7,h8,hll,hl0,hl2,hl3] : 

> rl3 := f inal_prem(th,HH, VV) ; 

rl3 = 0 

STEP 9. Compute the value of the multipliers 

> M := mulf_list(th,HH,VV) ; 

M = [1, b, b, 1, b, —b'^—b^e^, 1, e, e^-|-c^, —1, —e—b^e, —e^—b^e^—c^, —e'^b^c+e'^bc^] 

Under parameter conditions (5 0 e), every element in M is non-zero except 

the last one, which must also satisfy: 0 y^ c y^ 6 (this condition can be avoided, 
treating it as a particular case or selecting another coordinate system). 

In accordance with theorem 5, we have obtained the following extension of 
Simson-Steiner Theorem to 3D: 

Let us consider in the real euclidean space IR^.' a tetrahedron (OABC ), an ar- 
bitrary point (X ) and its orthogonal projections (M, N,P,Q) on the face-planes 
of OABC. Then the locus of point X, such that vol(MNPQ) = v (v constant) 
is the cubic surface 4> = 0. 
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In particular, for 6=l,c=— l,e = l, this cubic surface equation is 

— izx — xz^ + 2zxy + y^z — xy — zy + y'^x + ^x^z + x^y + 2z^ + 12v = 0 

For u = 0 {M, N, P,Q coplanar points), vertices of OABC are singular points 
of the surface and the border-lines of OABC are contained in the surface. This 
surface is visualized in Fig. 2 using package DPGraph2000. 




Fig. 2. Cubic surface loci (/) = 0 for u = 0 



6 Conclusions 

The method to determine automatically geometric loci we have developed has 
been justified without using a deep algebraic background, and the base field does 
not need to be algebraically closed. 

It is useful when some of the points that are directly involved in the (thesis) 
condition that defines the locus are determined by a non empty set of (hy- 
potheses) conditions. Both of them (hypothesis and thesis conditions) must be 
converted into polynomial equations. 

As we have shown in the preceding section, the method can be applied to 
detect geometric loci in an n-dimensional euclidean space (for a given n, fixed 
in advance). 

Finally, we would like to remark that we do not intend to retire classical 
methods to find geometric loci, that use techniques of synthetic geometry. But 
many times it is not easy to find the key idea that solves the problem with classic 
techniques. Then the standard technique shown here can be very useful. 
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Abstract. We present a new method for implicitization of paramet- 
ric curves, surfaces and hypersurfaces using essentially numerical linear 
algebra. The method is applicable for polynomial, rational as well as 
trigonometric parametric representations. The method can also handle 
monoparametric families of parametric curves, surfaces and hypersur- 
faces with a small additional amount of human interaction. We illustrate 
the method with a number of examples. The efficiency of the method 
compares well with the other available methods for implicitization. 



1 Introduction 

The problem of implicitization for curves, surfaces and hypersurfaces is an im- 
portant problem in Algebraic Geometry with immediate practical applications 
in such areas as Geometric Modeling, Graphics, Gomputer Aided Geometric 
Design (see [Hof89]). The implicitization problem has been addressed using a 
variety of mathematical methodologies and techniques including Grobner bases, 
(see [Buc88], [Kal90], [GG92], [LM94], [FHL96]) Gharacteristic sets, (see [Li89], 
[GaoOO]) Resultants, (see [SAG84], [GG92]) Perturbation, (see [Hob91], [MG92a], 
[MG92b], [SSQK94], [Hon97], [SGD97]), Multidimensional Newton formulae (see 
[GV97]), Elimination theories, (see [SAG85], [Wan95]) and Symmetric functions 
(see [GVT95]). 

We note that the inverse problem of parameterization, which is an equally 
important problem in Algebraic Geometry with direct practical applications, 
has also been investigated by many authors (see for example [AB88], [AGR95], 
[HS98], [Sch98], [SW91], [SW98]). 

Some of the above methods work for special categories of curves, surfaces and 
hypersurfaces. Moreover, some methods handle only special kinds of parametric 
representations, like polynomial, rational or trigonometric ones. 

* Work supported by the Ontario Research Centre for Computer Algebra the Ontario 
Research and Development Challenge Fund and the Natural Sciences and Engineer- 
ing Research Council of Canada. 
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It is important to have efficient algorithms to solve the implicitization and pa- 
rameterization problems. This is mainly because in many practical applications 
and depending on the particular circumstances we want to use the parametric 
equations or the implicit equation. 

In this paper we present a new implicitization method for curves, surfaces 
and hypersurfaces that works for polynomial, rational and trigonometric param- 
eterizations. The method uses an alternative interpretation of the implicitization 
problem as an eigenvalue problem, inspired by theoretical considerations coming 
from the area of the Calculus of Variations (see [Tro83]). The method ultimately 
uses numerical linear algebra to recover the implicit (cartesian) equation from 
the parametric equations. 



2 Description of the Problem 

In what follows the term, geometric object, will be used to describe a curve, a 
surface, or a general hypersurface. 

A parameterization of a geometric object in a space of dimension n can be 
described by the following set of parametric equations: 

— /*l(fl5 ■ ■ ■ ^^k') ^ ■ • ■ j — fn (1) 

where the t\, . . . ,tk are parameters and the functions /i , . . . , /„ can be polyno- 
mial, rational or trigonometric functions. The case n = 2 corresponds to curves, 
the case n = 3 corresponds to surfaces and the case n > 4 corresponds to hy- 
persurfaces in general. The implicitization problem consists of computing the 
polynomial cartesian (implicit) equation 

p(a:i,...,a;„) =0, (2) 

of the geometric object described by the parametric equations (1), which satisfies 

. . . ,tk), . ■ ■ , fnih, ■ ■ ■ ,tk)) = 0, 

for all values of the parameters t\, . . . ,tk- 



3 Implicitization as an Eigenvalue Problem 

Suppose that g{x, y) = M{x, y)a where a is the vector of (unknown) coefficients 
of the polynomial g{x,y) and M{x,y) = [1, a;, y, . . . , y™] where m is the total 
degree of g{x,y). Given a parameterization (x(s),y(s)) of g{x,y) = 0, numerical 
or exactly-known, we can ask for the vector a that minimizes 




176 



Robert M. Corless et al. 



J{g)= f w{s)g*gds 
J So 



/So 

rsi 



w{s)sL* M* {x{s),y{s))M{x{s),y{s))sLds 



(for a specified positive weight function w{s)) subject to the constraint ||a|p = 1. 
Forming the Lagrange multiplier we get the standard Rayleigh-Ritz problem of 
minimizing K(a) = J + A(1 — a*a). 

Consider K(a+ Aa) — K (a) = 

2Aa*{G-XI)a+Aa*{G-XI)Aa (3) 

where G is the (Hermitian, positive semi-definite) structured matrix 

G= f " w{s)M*Mds, (4) 

J So 

and we therefore see that if 

[G - A/] a = 0 , (5) 

then A is an eigenvalue of G with eigenvector a; therefore A > 0 because G is 
positive semidefinite. This gives K{a+ Aa) — K{a) — Aa*{G — XI) Aa. Now if A 
is the smallest eigenvalue of G, the eigenvalues of (G — XI) are all non-negative, 
and hence 

K{a + Aa) - K{a) = Zla* (G - A/)Z\a > 0 . (6) 

Thus our minimum will occur at an eigenvector of G corresponding to its smallest 
eigenvalue. 

Moreover, the standard theory [Tro83, p. 343] shows that the eigenvalue X is 
exactly J{M(x,y)a) for the corresponding eigenvector a. 

Finally, equality in (6) occurs if and only if Z\a is also an eigenvector corre- 
sponding to A. This is possible only if the smallest eigenvalue is multiple. 

More generally, the conditioning of these eigenvectors depends on the dis- 
tances to the nearest other small eigenvalues [GVL95]. Errors are amplified by 
a factor of essentially l/(Afc — Am). 

If an implicitization exists with the support M(x,y), then finding a vector 
in the null space of G finds this implicitization. 



4 Description and Implementation of the Method 

In this section we describe the algorithmic steps of the method in pseudo-code. 
We have implemented the method in MAPLE and tested our implementation 
with all the examples given in the next section. 
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Input: Parametric equations of the form (1) for specific n, k. 

Ontpnt: The cartesian (implicit) equation for the geometric object 
represented by these equations. 

Step 1: choose m (total degree of the implicit equation). 

Step 2: construct the line matrix v of all power products of total degree 
up to m in the variables xi, . . . ,Xn ■ 

Step 3: compute the matrix M = ■ v. 

Step 4: substitute xi, . . . ,Xn by their parametric representations (1), 
in the matrix M. 

Step 5: integrate the elements of the matrix M successively over each 
parameter ti, ... ,tk- 

Step 6: compute a null-vector nv of the matrix resulting from Step 5. 

Step 7: recover the implicit equation as the product M ■ nv. 



Several comments are in order to clarify certain points in the above descrip- 
tion of our implicitization method. 

1. During the integration step, care should be taken so as to avoid integrals 
with infinite values or divergent integrals. Such degenerate cases may occur 
when for example the parametric equations contain denominators or trigono- 
metric functions. Usually it is an easy matter to choose suitable intervals of 
integration. In the case of rational parametric equations this is the problem 
of base points (see for example [CG92], [MC92a], [MC92b]). 

2. Another issue related to the integration step, is that sometimes it is in- 
evitable to perform the integrations numerically, simply because the analytic 
expression is either too complicated to be of any use, or is not elementary. 
When numerical integration is employed, the resulting matrix will have float- 
ing point elements and one should be very careful about how to compute 
correctly the nullspace. Indeed, it may happen that according to the preci- 
sion used for the computation, one obtains one or more vectors as a basis 
for the nullspace. 

3. In the last step of the algorithm, we obtain the cartesian equation in the 
variables xi,. . . ,a;„, but this will not always be a polynomial with integer 
coefficients. Some more processing is necessary to discover the integer rela- 
tions among the coefficients and finally multiply by the appropriate number 
to unveil a polynomial with integer coefficients. This can be done using in- 
teger relation-finding algorithms as they are implemented in Maple. 

5 Application of the Method 

In this section we give some examples to illustrate the use of the implicitiza- 
tion method for curves and surfaces. The method works for rational as well as 
for trigonometric parameterizations and can also be used to recover cartesian 
equations for monoparametric families of curves or surfaces. 
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Example 1. (The Descartes Folium) 

Consider the following parametric equations for the plane algebraic curve known 
as the Descartes Folium: 



_ 3t 



( 7 ) 



Choose m = 3 and define the line matrix v = [l,x,y,x^,xy,y^,x^,x^y,xy^,y^] 
and form the associated 10x10 matrix M = ■ v: 
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x^ 


xy 
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We substitute equations (7) into the matrix M to obtain a new matrix M' . 
Integrate all the elements of the matrix M' with respect to t over the interval 
[0, 2]. Since the denominators in equations (7) have a singularity at t = — 1, we 
choose an interval which does not contain that point. The integrations can be 
performed symbolically or numerically. We prefer the numerical evaluation in 
this example, because the analytical expressions for the integrals yield a fairly 
complicated matrix. The difference in the computing times between calculating 
the nullvector for the analytical and the numerical matrix, is dramatic. The 
numerical rank of the resulting matrix is 9, which means that its nullspace is 
of dimension 1 and thus generated by one nullvector. The Maple environment 
variable Digits is set to 15 in order to achieve a better accuracy. We compute 
the nullvector and multiply it by v, to obtain the equation 

-0.9045 xy + 0.3015 x^ + 0.3015 y^ = 0, 



which shows that the implicit equation of the Descartes Folium is: 

x^ + y^ — 3 X y = 0. 

The most time-consuming part of the computation (2.5 sec) is the numerical 
evaluation of the 100 definite integrals involved. 
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Example 2. 

Consider the following trigonometric parametric equations of the unit sphere in 
three-dimensional space: 

X = cos 6 sin 4>, y = cos 6 cos (p, z = sin 9. 

We form the 10 x 10 matrix with first row [l,x,y, z,x‘^,xy,xz,y‘^,yz, z'^] and 
integrate from 0 to tt/ 3 for 9 and (p successively. The resulting matrix has rank 
9 and its nullspace is spanned by the vector [—1, 0, 0, 0, 1, 0, 0, 1, 0, 1]. This gives 
directly the cartesian equation of the unit sphere: 

-I + x'^ + y"^ + z"^ = 0 . 

The method works also for rational parameterizations of the unit sphere. 

Example 3. 

Let a be a parameter and consider the family of curves defined by the following 
rational parametric equations: 

t(a — (a — 

+ (l + f2)2 

We compute the cartesian equation for some values of a and by extrapolation 
we have that the general monoparametric cartesian equation for the family of 
curves is: 

— a yx^ + 2 x^y^ + y^ + y'^ = 0. 

Now an easy computation shows that this equation is indeed valid for arbitrary 

a. 

Example 4. 

Unfortunately other monoparametric families present bigger difficulties. Con- 
sider the family of curves given by the polynomial parametric equations: 

X = t 1 , y = t 1 ( 8 ) 

where n is a parameter. We compute the cartesian equation for some values of 
n and we see that we have to distinguish two cases according to the parity of n, 
for the general monoparametric cartesian equation of the family of the curves. 
For n even the cartesian equation has n -I- 1 terms, and is of the form: 

k 

x'^ + ^{a^ + biy)x^ + y'^ -nxy = Q, ^ ( 9 ) 

i=2 

where the Oj, bi are constants. For n odd the cartesian equation has n -I- 3 terms, 
and is of the form: 

x'^ + 2x’‘~^^ + '^{c^ + diy)x’‘ -2y - y'^ = 0 , k= ^ ^ , (10) 

2=1 
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where the Ci, di are constants. We have not solved the problem completely, unlike 
the situation in the previous example. But now for an arbitrary positive integer 
n we can substitute the parametric equations (8) into (9) for even n (resp. to 
(10) for odd n) and determine the unknown coefficients ai,bi (resp Ci,di) by 
solving a highly structured linear system of 2n — 1 equations in n — 2 (resp. n—1 
unknowns) . 

Example 5. 

The following example is taken from [GC92] . Consider the parametric equations 
for a Bezier curve: 



- 12f5 + 32t3 + 24t2 + I2t 

~ - 3 + 3 + 3 + 3 1 + 1 

and 

24t3 + 54t3_54t3-54t2 + 30t 

~ - 3 + 3 + 3 12 3 ^ 1 

We form the 10 x 10 matrix with first row [1, x, y, x'^, xy, x^, x'^y, xy'^, y^] and 
perform the integration from t = 1 to t = 2. The resulting equation to an 
accuracy of 7 decimal digits is: 

0.0001644979 y^ + (0.005604679 - 0.001665542 x) 

+ (-0.001110361 a; - 0.3527776 - 0.00003965575 x^) y 

+0.8819439 a; + 0.02516157 0.3115356 = 0. 

If we normalize this equation by dividing with the smallest coefficient in absolute 
value, then we get the following equation, in which some of the integer relations 
between the coefficients of the final equation appear already: 

4.148147 y3 + (141.3333 - 42.00001 a;) y^ 



+ (-1.0 x^ - 8896.001 - 28.0 a;) y 
-7856.001 x^ + 22240.0 x + 634.4999 x^ = 0. 

The final step is provided by either Maple (using the convert/rational com- 
mand in conjunction with a small value of the Digits environment variable, say 
5) or RevEng, the newer version of the Inverse Symbolic Calculator available 
on-line from the CECM (http://www.cecm.sfu.ca/MRC/INTERFACES.html). 
We disregard the integer coefficients in the above equation and after processing 
the remaining three non-integer coefficients we discover that: 



4.148147 



112 



141.3333 



424 



634.4999 



1269 

2 



Multiplying the equation with the 1cm of the denominators which is 54, we get 
the final cartesian equation: 




Numerical Implicitization of Parametric Hypersurfaces with Linear Algebra 181 



224 J/3+ (7632- 2268 a;) + (-1512 a; - 480384 - 54 a;^) y 

+34263 - 424224 + 1200960 a; = 0, 

which can easily be verified with Maple. 

Example 6. 

The following example is taken from [SAG85]. Consider a rational cubic Bezier 
curve with control points 



Po = (4,1), Pi = (5,6),P2 = (5,0),P3 = (6,4), 

with respective weights wg = l,wi = 2,W2 = 2, wg = 1. The parametric equa- 
tions of the curve are given by: 

X = 2t^ — ISt^s + 18 ts^ + 4s^ 
y = 39 — 69 t^s + 33 ts^ + 
z = —3t^s + 3 ts^ + 

We choose to work with a 20 x 20 matrix (total degree 3) and integrate from 0 
to 1 for t and s successively. The resulting matrix has rank 19 and its nullspace 
is spanned by a vector with small rational coefficients. Multiplying by the 1cm 
of the denominators we get the following cartesian equation: 

224 2/3 _ 7056 y‘^x + 33168 y^z + 60426 yx^ - 562500 yxz + 1322088 yz^- 156195 

+2188998 - 10175796 + 15631624 = o. 



6 Conclusion 

We present a new method for doing implicitization of curves, surfaces and hy- 
persurfaces, using essentially linear algebra. The method works for polynomial, 
rational and trigonometric parametric equations. The method also applies to 
monoparametric families of parametric curves, surfaces and hypersurfaces, with 
a small amount of extra work. The method is quite efficient due to the fact that 
it does not use Grobner bases or multivariate factorization computations. The 
efficiency of the method can be improved by taking into account the special 
structure of the matrices involved in the computation. 
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Abstract. This position paper proposes a mathematical modeling ap- 
proach for a certain class of connectionist network structures. Investi- 
gation of the structure of an artificial neural network (ANN) in that 
class (paradigm) suggested the use of geometric and categorical model- 
ing methods in the following sense. A (noncommutative) geometric space 
can be interpreted as a so-called geometric net. To a given ANN a cor- 
responding geometric net can be associated. Geometric spaces form a 
category. Consequently, one obtains a category of geometric nets with a 
suitable notion of morphism. It is natural to interpret a learning step of 
an ANN as a morphism, thus learning corresponds to a finite sequence 
of morphisms (the associated networks are the objects). An associated 
(“local”) geometric net is less complex than the original ANN, but it 
contains all necessary information about the network structure. The as- 
sociation process together with learning (expressed by morphisms) leads 
to a commutative diagram corresponding to a suitable natural transfor- 
mation. Commutativity can be exploited to make learning “cheaper”. 
The simplified mathematical network model was used in ANN simula- 
tion applied in an industrial project on quality control. The “economy” 
of the model could be observed in a considerable increase of performance 
and decrease of production costs. 



1 Introduction 

In this note we give a brief overview of work (in progress) dealing with a math- 
ematical approach for modeling the network stucture of an artificial neural net- 
work (ANN). The work originates in a fruitful cooperation with H. Geiger who 
introduced his own network paradigms with neuron types developed on basis 
of neuro-physiolocial insight into biological information processing of cells. The 
architecture of these networks is very flexible, allowing a modularized design of 
network stuctures, integrating cascades of various neuron layers having differ- 
ent functionality (like feature extraction). Feedback loops can be modeled, too. 
The main neuron models are CCM (Conductivity Coupled Model), RCM (Rate 
Coded Model), SSM (Single Spike Model). The latter one is a rather complex 
neuron type (with many parameters) used to model time-dependent behavior 
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in ANNs. In principle, all kinds of learning can be applied. The main learning 
rules used in Geiger’s paradigms are forced learning (processing activities of neu- 
rons), delta rule learning, Hebbian learning (both processing synaptic weights, 
respectively) . The first version of a simulator for these network types was imple- 
mented by H. Geiger (late 1970s) in terms of a Network Gommand Interpreter 
(NGI). Under his guidance this tool was developed further (by several diploma 
and doctoral students at TU Munich and colleagues in his company), resulting 
in the simulator “NeuroTools” which, since then, has been successfully applied 
in various industrial projects by Geiger and his coworkers. Later, in our group, 
a new command language was developed (in close cooperation with Geiger and 
his group) improving the older tool. This new version, “NeuroTools 6.0”, is now 
being tested. 

In this article our interest focuses on the net structure of an ANN. Gonse- 
quently, neurons are abstractly modeled as nodes of a directed (and colored) 
graph, independend of their specific neuron types, respectively. Therefore the 
previously mentioned neuron models and rules are not of relevance in this con- 
tribution. A detailed presentation will be included in [GP]. Analyzing how the 
previously mentioned networks are designed by its inventor, we can observe that 
their architecture is amenable to mathematical modeling in a natural way. It has 
turned out in the course of our cooperation that methods from geometry (so- 
called “noncommutative geometric spaces” ) and category theory can be applied 
to model the network structures. The rough idea can be described as follows. 

It can be observed that the corresponding ANNs are regularly structured. 
That means, considered from the “view point” of each node, the “local” structure 
(including connections) of the network is the same. Mathematically, one speaks of 
a “pointed space” when selecting a distinguished point in a space and describing 
the structure “locally” . In this sense, the whole ( “global” ) network structure can 
be homogeneously described by its pointed spaces, i.e. the essential information 
for structuring the net is given locally. Which roles do geometry and category 
theory play - how do they arise ? 

The regularly structured networks can be interpreted as directed colored 
graphs. Input and output layers do not have to be distinguished. Goncentrat- 
ing on the local and global network structure, it turns out that it is reasonable 
to associate an (abstract) “geometric net” to a given ANN, globally, and to 
introduce a smaller associated net that reflects the local network structure (con- 
taining the essential information). A (noncommutative) geometric space has a 
natural interpretation as a directed colored graph - we call it geometric net. This 
suggests interpretation of an ANN structure as a geometric net. Noncommuta- 
tive geometric spaces form a category (NCG) with geometric spaces as objects 
and structure preserving maps as morphisms. This, in turn, leads to the category 
of geometric nets and to the interpretation of an ANN structure as the object of 
a corresponding category. Now, from this point of view, we would like to model a 
learning step as a morphism between networks (objects) and learning as a finite 
sequence of morphisms in such a category. 
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Subsequently, we introduce the basic mathematical notions and apply them 
in the previously mentioned way. For details we refer to the literature. A detailed 
exposition with much more material on the ANN paradigms and the mathemat- 
ical modeling aspects is in preparation ([GP]). A very short version containing 
the first basic ideas is [GP95] . At the end of this contribution we briefly present 
an industrial application on optical quality control carried out by H. Geiger (cf. 
[Gei94]). Deployment of the mathematical model on a macro level for purposes 
of implementation in the simulation ( “NeuroTools” ) and application of the net- 
works leads to a considerable improvement of performance. Shortening of pro- 
gram code and run time speed-up can be achieved and less storage space is 
needed. Summarizing, the production costs in that project could be enormously 
reduced. This is how the simplifying mathematical model has an “economic” 
impact on applications. 

A final remark on the use of categorical notions. A priori, there is no need at 
all for using categorical modeling. But in fact, it was “categorical thinking” that 
opened our eyes for simplifications which, together with the geometric viewpoint, 
led to the interpretation of a network as an “object” and a learning step as a 
“morphism” . 

Goncluding the introduction, we point to the interesting article by R. Eck- 
miller ([Eck90]) where he suggests the use of various mathematical methods, es- 
pecially from geometry, in the held of connectionist network modeling. In some 
respect our work is in this spirit. 



2 Remark on Geiger’s Network Paradigm 

We recall that this contribution concentrates on modeling network structures. 
We do not consider specific neuron types and learning rules. Neurons are just 
abstractly modeled as nodes of an associated network and learning will be inter- 
preted in terms of certain mappings. Thus, our approach has more of a qualitative 
nature. 

As already indicated in the introduction, the ANN structures developed and 
(industrially) used by H. Geiger (and implemented for simulation in “Neuro- 
Tools”) are regularly structured in the following sense. As illustrated in the 
figure the cascaded network structure consists of an input retina FLq followed 
by several (problem dependend) feature layers FLi,... ,FL„ - receptive fields 
(feature detectors). 

Of basic importance is the observation that the networks are locally pre- 
structured in a way which is amenable to a geometric interpretation in the sense 
of local configurations. A local description of the structure means selecting a 
fixed (but arbitrary) node (neuron), say xg, of the network and specifying the 
connections (directed edges) with other nodes in a well defined neighborhood. We 
call this a pointed network (pointed space, in general mathematical modeling). 
We can take a special feature detector as in the figure (“local view”), for example. 

Gonsidering a pointed network in a node xg, the local structure consists of 
configurations. In a geometric sense this means triangles (3-tuples), tetrahedrons 
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Global Multi-Layer Network 



Fig. 1. a multilayer network (cascades) 



(4-tuples), more generally higher dimensional abstract simplices (n-tuples) con- 
sisting of neighboring nodes of xg and corresponding connecting synaptic (col- 
ored) edges. This yields a very rich structure from a geometric standpoint. 

For illustration we refer to the following figures representing a local feature 
detector and its associated (abstract) local net (the weights of the directed edges 
are not depicted) . This special picture corresponds to a local edge detector that 
will be discussed later in section 4. 

As we have mentioned above, in each neuron (node) of a layer the corre- 
sponding pointed network (local view) is the same. This leads to a homogeneous 
structure from a global point of view. Actually, this is where we turn to our 
geometric model: to every pointed network a geometric net will be associated 
encoding all the relevant information of the ANN locally, but since this situation 
is the same in each node, the global network can be reconstructed (“shifting” 
the pointed local network over all other nodes). Geometrically, this amounts to 
shifting all the geometric configurations in xg to all other nodes. 

Now, a crucial aspect is that learning can be done locally, replacing the 
learning procedure of the entire (global), possibly large, neural network. This 
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local feature detector associated local net (abstraction) 



the "local view" of the network 
Fig. 2. “local view” of a network (feature detectors) 



reduces the complexity of learning considerably. Later we will come back to that 
aspect. By the way, simulation of ANNs consisting of about one million neurons 
or more is no problem for “NeuroTools” . 

In practice (and in the simulator NeuroTools) nodes of a particular ANN are 
placed in a regular grid. As a 2-dimensional grid we can use, for example, Z x Z. 
Furthermore, Geiger’s modeling approach uses an “embedding principle”, i.e. a 
real (finite) ANN, designed for a particular application, is embedded in a larger 
network. This allows us to verify the previously mentioned homogeneously dis- 
tributed local regular structure of a corresponding ANN - in each node (pointed 
net, “local view”) there is the same net structure. We point out here, that for 
our mathematical model of a network structure we do not resort to an under- 
lying grid. The associated nets will be abstract mathematical models, but they 
contain all the relevant information of the corresponding network (set of nodes 
and synaptic connection structure). 



3 Some Basic Mathematical Notions 

In this section we develop the basic mathematical notions, especially from geom- 
etry, and show how the concept of a noncommutative geometric space naturally 
arises in our study of networks. Accordingly, we introduce the notion of a ge- 
ometric net. Such geometric nets can be associated to connectionist networks . 
As already indicated we will use the language of category theory as “linguistic 
basis”. It is an experience that “categorical modeling” can lead to the effect 
of a “formal economy” . (We note here that our notion of a category has to be 
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distinguished from the categories discussed in the cognitive sciences. We refer 
to the remarks in [NKK94], section 12.3; this has nothing in common with our 
work below) . Thus, we are interested in a category of associated geometric nets 
that incorporates all the necessary information which is needed to represent a 
corresponding connectionist network mathematically. Additionally, we aim at 
modeling learning in terms of the categorical notion of morphisms. 

In [NKK94] a mathematical definition of an ANN is introduced which fits 
the common “classical” neural network paradigms. For our purposes here and 
for future use we propose the following general (preliminary) definition (more 
details will be discussed in [GP]). A network (ANN) consists of an underlying 
set of nodes ( “points” , in terms of elements of a corresponding space) X and a 
bivariate map <, >: X x X — > R, where i? is a set. The map <, > assigns to 
each pair of nodes (points) x,y a “weight” (“color”) <x,y>&R. In classical 
ANNs this is a numerical value, but we want to be able to model more general 
data for an ordered pair (“edge”) (x,y) ~ like predicates, states, activities, etc.. 
X is partitioned into disjoint subsets Xi, ...., X„, called “layers” . There are two 
distinguished layers, say Xi (“input layer”) and Xn (“output layer”), being of 
relevance for real applications. The other layers are often called “hidden layers” . 
A layer Xi can be interpreted as a (sub-)network of X given by 

Xi X Xi — > Ri, (<, >i is the corresponding restriction of <, > and Ri 
the corresponding subset of R) . It is clear, that the “local” and “global” network 
structure is determined by <,>. In practical applications a node (point) of X 
will be a “neuron” (“processing element”); it has to be specified by corresponding 
parameters and information processing capabilities (e.g. threshold unit, tranfer 
functions). In this contribution we do not deal with these aspects, we are only 
interested in the network structure ( “topology” ) . As discussed below, X will be 
interpreted as an object of a suitable category and learning as (a sequence of) 
morphisms. 



3.1 Some Basic Notions from Category Theory 

For convenience of reading we recall the basic notions from category theory, like 
category, functor, natural transformation, as we need them for later use. 

Definition 1. A category A consists of a class of objects, denoted by A, B, 
C, . . . € Obj{A) (the objects of A), and for each pair of objects A, B a set of 
morphisms, Mor{A,B), also denoted by A{A,B) and a composition relation on 
morphisms such that if f : A ^ B and g : B ^ C are morphisms, then there is 
a morphism g o f : A ^ C, the composition of f and g. For these notions the 
following two axioms are required for a category. 

(i) The composition of morphisms is associative, that is h o {g o f) 

= (hog)of. 

(ii) For every object A there is the identity morphism id a with the 
properties f o idA = f and ids ° f = f for all f : A ^ B. 

(Note that Mor{A,B) can be empty). 
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We briefly emphasize here that the arrow notation for morphisms is of basic 

f 

importance. We shall use f : A ^ B as well as A ^ B to denote morphisms. 
The arrow notation is well suited to illustrate (to “visualize”) a broad spec- 
trum of modeling problems in a categorical sense (e.g. everything dealing with 
relational structures). 

Some typical examples of categories in mathematics are, among others: the 
category of sets, groups, monoids, topological spaces, vector spaces over a held, 
etc.. General relational structures can be interpreted in categorical terms (cf. 
[Pfa94]). Summarizing, one can say that category theory discusses the basic 
features of “everyday work” when dealing with spaces in a certain discipline and 
studying structure preserving functions (the morphisms) between spaces. 

For later use we introduce the notion of the category of pointed sets, SET*. 
A pointed set (set with base point) Xa is a set X together with a selected “base 
point” a € X. If X a, Xh are pointed sets, then a base-point-preserving map is a 
map / : Xa — > Xt, s.th. /(a) = b. With pointed sets as objects, base-point- 
preserving maps as morphisms, and ordinary composition of maps we obtain the 
category SET*. The notion of a pointed space (space with base point) is basic 
in algebraic topology (homotopy theory). 

Definition 2. The notion of a functor constitutes a concept of ‘function” be- 
tween categories. Let X and Y denote two categories. Then a functor F : X 
— > Y assigns to every object A G ObjfX.) an object F{A) in the category Y 
and to every morphism f : A ^ B inX a morphism F{f) : F{A) F{B) in 
Y such that the following holds for morphisms f : A ^ B, g : B ^ C and idA 
in X 

(1) F{g o f) = F{g) o F{f) 

(2) F{idA) = idpiA) 

More specifically, such a functor is called covariant; it is called contravariant, 
if it reverses arrows and thus reverses the order of the arrows of a composition 
of morphisms (i.e. F{g o /) = F{f) o F{g) ). 

On the next higher level of abstraction the notion of a natural transformation 
is settled. It is a kind of a function between functors and is defined as follows. 

Definition 3. Let F : X — > Y and G : X — > Y he two functors. A natural 
transformation a : F — > G is given by the following data. 

For every object A inX there is a morphism aA ■ F{A) G{A) in Y such 
that for every morphism f : A ^ B in X the following diagram (square) is 
commutative. 



F{A) G{A) 




F{B)^G{B) 

Commutativity means (in terms of equations) that the following compositions 
of morphisms are equal: G{f) o a a = ob ° F{f). 
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The morphisms ua, A € Obj{A), are called the components of the natural 
transformation a. 

For more material about categories we refer to the rather extensive literature 
(here only citing [AHS90], [Lan98], [LS96], [Pie91]). 

3.2 Colored Graphs 

In our considerations below we are working with the following definition of a 
graph and (general) net. We keep close to the exposition of that material as 
presented in [Pfa95] and the literature used there. 

Definition 4. A directed graph (with orientation) Y is given by a set of vertices 
(nodes) VY 0 and a set of edges (arcs) EY , where VY n EY = 0, and two 
incidence maps i. : EY VY, t : EY VY . 

For e G EY we say that te is the initial point and re is the terminal point 
of the edge e, respectively. Through the maps t and t each edge e obtains an 
orientation and in that way the graph is directed with orientation. In general, 
we allow ie = re, that means e is a loop. 

A (general) net is defined as a directed (and, if necessary, oriented) graph Y 
with a valuation or weighting or coloring oj : EY ^ R, 

Let r and X denote two directed (oriented) graphs, as defined at the begin- 
ning. A graph morphism a : T X is defined in the obvious way as a structure 
preserving map from vertices to vertices and edges to edges. More explicitly, let 
e be an edge with vertices te = v, re = w, then the image a(e) is the edge 
with vertices a(u), a{w). 

It is then clear what is the definition of a graph isomorphism and graph auto- 
morphism. Accordingly, for general nets a morphism is defined as a graph mor- 
phism which respects the coloring on the edges. That means w(ci) = 00 ( 62 ) ^ 
u>{aei) = oj(ae 2 ) - this is compatible with the definition of a morphism in the 
category of geometric spaces given below. 

3.3 A Category of Geometric Spaces 

We briefly recall here the definition of a geometric space and for the details 
we refer to the literature. Noncommutative geometry was introduced by J. Andre 
more than 20 years ago as a natural generalization of classical affine geometry. A 
line xUy joining two points (x,y G X) of the underlying point set X is directed 
(e.g. like a ray in Euclidean spaces); thus, xUy ^ yU x (noncommutativity of 
the join operation), in general. As usual, the notion of parallelism is defined as 
an equivalence relation on the set of all lines of a space. As a very brief selection 
of references of the extensive work of J. Andre we cite [And88, And92, And93]. 
In [Pfa85] a new modeling approach for noncommutative spaces was introduced 
leading to an algebraization of the notion of a noncommutative geometric space. 
It turns out that the parallelism of a space encodes already the whole geometric 
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structure. In this approach a geometric space is defined by a map <, >: R 

called parallel map or parallelism; a line joining two points x,y € X is defined 
as follows: 

xOy := a;Uy U {< x,y >}, where < x,y > is called the ideal point or 
direction or color of the line and xUy ■= {a;} U {z \< x,z >=< x,y >} is 
the set of proper points of the line - it is the solution set of the equation 
< xX >=< x,y >. R denotes the set of directions, colors of a space. The validity 
of geometric axioms, configurations can be expressed in terms of the solvability 
of corresponding equations in th.\s “calculus” (cf. [Pfa85]). Concerning geometric 
axioms, the parallel shifting of triangles and, more generally, the shifting of 
simplices across a space, is of basic importance. If such configurational conditions 
hold, a space has a rich geometric structure (cf. e.g. [Pfa87]). 

An example for illustration {ray space): Let X := K” be the real n-space, 
then a non-commutative space with a rich geometric structure whose lines are the 
(directed) rays can be defined by (K+ denotes the non-negative reals): xLiy := 

x-|-K+.(y — x), the ray beginning in x going through y. Two rays are parallel, 
if they have the same direction). In terms of a suitable parallel map <, > the 
ray space can be defined as follows. X := K", R := 5”“^ U {0}, where 5'”“^ 
denotes the (n — 1)- dimensional unit sphere in K". Then the n-dimensional 
ray space is defined by the parallel map <,>: X'^ — *■ A, < x,y >:= 
if X y, and < x,x >:= 0. Applying the definition of a line, the set of 
proper points of the line joining two different points x and y is given by the ray 
X U y = X + K+ • {y — x), whereas xUx = {a;}. 

A morphism between two geometric spaces {X\, <, >i, i?i), {X 2 , <, > 2 , R 2 ) is 
defined as a map / : Xi — > X 2 which respects the underlying geometric struc- 
ture, namely the parallelism, i.e. the following condition holds for all x, y,u,v € 
Xi: if < x,y >i=< u,v >i then < f{x), f{y) > 2 =< f{u), f{v) > 2 - 

With these notions of a geometric space and corresponding morphisms we 
obtain the category of geometric spaces which we denote by NCG (cf. [Pfa98] for 
a summary) . The composition of morphisms is the usual one (as in set theory) . 
For illustrational purposes (and to prepare pictorially how we pass on to geomet- 
ric nets later) we include Fig. 3 below showing the “local view” of a geometric 
space in a selected point x (i.e. the “pointed space” Xx,x G X). 



3.4 Geometric Nets 

After this brief collection of the basic introductory notions from geometric spaces 
we come to the natural link with net theory. 

The “<, >-notion” for geometric spaces leads in a natural way to an associ- 
ated net which we will call geometric net. Let {X, <,>, R) be a given geometric 
space. The following directed (oriented) colored graph Y (geometric net) can be 
associated with it, naturally: 

VY := X, EY:=X'^, lc:=x, Te:=y, for e={x,y)GEY. 

The coloring ut : EY ^ R is defined by w(e) :=< x,y >, for e = (x,y) G EY. 
This leads to the associated geometric net. In all those cases where we want to 
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illustration of a geometric space (locally) 



Fig. 3. “local view” (pointed space) 



avoid loops e = (x,x) we can work with EY = X'^ \ ix (where ix = | 

X G AT}, all diagonal pairs), but in general we do not assume this, a priori. 
Thus, in our geometric nets it is suggestive to illustrate an edge e = (a:, y) with 
ie ^ re as a directed line segment with color (label) < x,y > (cf. the edge 
joining x and y in the “local view” of a geometric space (Fig. 3)). Conversely, 
given a (general) net, we obtain a geometric space having the nodes of the net as 
points and the corresponding parallel map induced by the coloring. In general, 
the underlying graph of a net is not complete (i.e. there are nodes not connected 
by an edge). If necessary we can make it complete: let x,y be two nodes that 
are not connected, then we define the direction < x,y >= oo in the induced 
geometric space, where oo is an additional, artificial direction (color). In this 
way we obtain the induced parallel map on all pairs of the underlying point set 
of the induced geometric space. Analogous to the category of geometric spaces 
NCG we obtain GeoNET, the category of geometric nets. 

4 Associated Geometric Networks 

As we already mentioned in section 2, the design of the network structures in 
Geiger’s paradigm is amenable to mathematical modeling with methods pre- 
sented in the foregoing section. 

For illustration, below we briefly sketch in a concrete example how geometric 
features arise naturally. To this end we take a standard network structure which 
arises in applications. In our case we consider the following filter for line segment 
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detection in pixel images; it is commonly in use (cf. for example [Roj93], section 
3.4). All points which we consider are elements of a regular integer grid (we 
simply take Z x Z). Locally, that means from the view of a selected grid point, 
denoted by xq, we consider a specific neighborhood, in our case the eight nearest 
neighbors in the grid, denoted by X\,X 2 , ■ ■ ■ , sitting around the center xq. 

The filter we are considering is then defined by the connection matrix 

-1 -1 -1 
-1 +8 -1 
-1 -1 -1 

This means that the central node receives a signal input weighted by — 1 
from all those neighboring pixels which are set to one (i.e. “black colored”) and 
all these input signals are summed up and added to +8, the result is compared 
with a threshold and if this threshold is exceeded an output value 1 is produced 
indicating that the corresponding central point is part of a line segment. This is, 
roughly speaking, how the local feature extractor in a corresponding perceptron 
network would evaluate this filter. Now, we are associating to this situation 
the following network (local geometric net) which we illustrate in the following 
picture 




X8 X5 X3 

O O O 




The corresponding net weights on the edges (coloring of the net) can be 
interpreted in terms of the following parallel map (coloring) 
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<,>: {xo,xi,... ,a;8} x {xo,xx,... ,a;8} — > {-l,+8}, 

with < a;o, xo >:= +8, <xq,Xi >:=—!, for all f = l,2,...,8 

defining an associated geometric space locally, that corresponds to our associated 
geometric net. “Locally” here means that we are only defining the values < 
xo,Xi > with respect to the “base point” xq from which the space is locally 
regarded. We can easily extend this definition of <,> to the whole grid by 
setting < Xq, z >:= 0 (or < xq, z >:= oo), for all other grid points z, and repeat 
this definition in every point xq, accordingly. This, finally, yields the complete 
parallel map (coloring). Thus, evaluating the line xqU x\, which is equal to 
xoUxi, for all i = 1, 2, . . . , 8, we obtain xq LI a;i = {a;o} U |< a;o, a;* >=< 
a:o,a:i >= —1} = {a;o} U {a:i, a; 2 , a; 3 , 0 : 4 , a;s, a;e, a; 7 , a; 8 } and this is exactly the 
“discrete” circle around xq with “radius” (color) equal to —1. Hence we have 
obtained a space which is related to a “circle space” (cf. [Pfa85], [Pfa98]). We 
point out once more, this associated net is to be seen as an abstraction of the 
concrete local feature extractor as used in Geiger’s paradigm (and similarly in a 
perceptron-like network). It encodes the essential information that is needed. 

Based on the previous considerations it turns out that a certain step of ab- 
straction is convenient, in the following sense . Given a concrete “real” ANN 
(designed for a particular task), denoted by Af, we assign to it an “abstract 
associated network” This associated net, denoted by GNET(Af), encodes all 
the essential network information, but is much better suited for mathemati- 
cal modeling purposes. The nodes of GNET(A/") represent the neurons of Af, the 
edges represent the synaptic connections (including weights), and the structure 
of GNET(Af) represents the network structure (“topology”) of Af (numerically 
expressible in terms of a corresponding connection matrix) . With respect to the 
category of geometric nets GNET(Af) is interpreted as an object of GeoNET. 

Now we come to the “local view” of a network as illustrated in the figures 
above. Mathematically, the local view of a network corresponds to a pointed 
net (analogous to pointed set), i.e. we select a node and describe the (local) 
network structure from the “viewpoint” of this specific node. Due to the fact 
that an ANN Af, and consequently GNET(Af), in each node has the same local 
structure (cf. section 2), it is reasonable to assign to a particular GNET(A/"), a 
specific pointed net, with respect to a selected node, say xg as base point (cf. 
the category of pointed sets in section 3). This local description of the network 
contains all the essential information, because the entire (global) network can 
be reconstructed by taking all local networks together (shifting the pointed net 
over all other nodes). To shorten notation we set X :=GNET(A/") and let Aa,g 
denote the pointed (local) network structure. It is important to note, that the 
(geometric, local) net structure of is completely given by the “local data” 
(connections, colors) < xo,g >, where g runs through all the points (nodes) of 
X. Due to local regularity (as previously mentioned), all pointed nets are iso- 
morphic to each other. This means, in every base point (node) we have the same 
situation concerning net structure and synaptic connections (colors). Thus the 
local description is less complex, but represents the essential net information. In 
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terms of geometric spaces this amounts to saying that in every pointed space the 
same geometric configurations appear. This is a very strong regularity condition 
and leads to a rich geometric structure. 

On the basis of these abstractions we formulate the process of learning. The 
standard way to describe learning is to change (update) the synaptic weights 
(colors) of the directed edges of a network. Thus, if i and j are the indices 
of connected neurons and Wij denotes the connection strength, then a learning 
step is described by Wij <— Wij + Awij . Usually, Wij is an integer or real number 
and Awij is determined by a learning rule. In accordance with our notation 
in the categories NCG and GeoNET, respectively, we can express a learning 
step as follows. Let Xi,xj denote the nodes in X = GNET(Af) representing the 
corresponding neurons in the ANN, then < Xi,Xj >= Wij. We point out that 
in our proposed approach the data < x,y > associated with a pair of nodes 
is not restricted to a numerical value only, but can consist of further data like 
states (activity modes) of the nodes (neurons). The following argument applies 
to Geiger’s regularly structured ANNs. If there is a pair of nodes (neurons) Xk, xi 
with < Xk,xi >= Wki = Wij =< Xi,Xj >, then a learning step causes the same 
change of weights, i.e. Awij = Awki- In other words, if two connection weights 
are equal, then after a learning step both can be changed, but their equality is 
preserved. This motivates us to model a learning step as a morphism in the sense 
of the categories NGG, GeoNET, respectively, in the following way. Let Xq 
denote the initial associated network and X\ the resulting network after the first 
learning step. Supposing that we do not alter the set of nodes (neurons) - this is 
usually the case - then a learning step can be modeled as an identity mapping on 
the nodes and a change of the corresponding weights or coloring. Geometrically, 
this means that the parallel map (coloring) <, >q: Xq — > R after learning step 
1 changes into <, >i: Xi — > R. Due to the previous remark, such a learning 
step corresponds to a morphism, denoted by L : Xq — > Xi, since the following 
holds. For points (nodes) x,y,u,v in Xq we obtain: 

< x,y >o=< u,v >0 < L{x), L{y) >i=< L{u), L{v) > 1 - Thus L, 

being the identity on nodes (points) of Xq, has the property of a morphism. We 
suggest giving this the name “homomorphic learning” , formulated in a categor- 
ical environment. Gonsequently, a learning process can be expressed in terms of 
a sequence of learning steps (composition of morphisms L), i.e. 

L L L L L 

^0 ^n-1 ^n- 

If no confusion can arise we use the same symbol L for each morphism in this 
sequence. 

Analogously, to each associated (global) geometric net Xi in the above se- 
quence we obtain a corresponding pointed net {Xi)^^, locally, in base point 
(selected node) xq. Learning w.r.t. these pointed nets functions in exactly the 
same way as described previously and the local change of synaptic connections 
matches the corresponding changes in Xi. Gonsequently, we have a correspond- 
ing sequence of pointed geometric nets representing “local” learning which is 
“cheaper”, than “global” learning considered above: 

(Ao),o ^ (Ai),„ ^ (A 2 ),„ ^ ^ {Xn-lU ^ 
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Analogous to the category SET*, we introduce the category of pointed geo- 
metric nets GeoNET*, a subcategory of GeoNET. The process of assigning a 
geometric net X to the pointed net (with base point xq, a node of A) is of 
functorial nature in the sense of a functor in category theory. Let us consider a 
fixed (finite) set of points (nodes) P interpreted as nodes of (associated) geomet- 
ric nets. Then the collection of all geometric nets having the same underlying set 
of nodes (points) P is a subcategory of GeoNET, denoted by GeoNET(P). Let 
xo be a node of P, then we can define the following functor Px,, : GeoNET(P) 
— > GeoNET*, that assigns to each geometric net X in GeoNET(P) the cor- 
responding pointed net X^g. It is clear that a morphism of GeoNET(P) leads 
to a corresponding morphism of GeoNET*. The properties of a functor are 
naturally verified. 

Finally, we can establish the following commutative diagram, using the pre- 
viously introduced notation. 



A - 

L 

A' 



■A. 



Xo 

L 



■Ko 



For the objects and morphisms on the right side of the commutative square the 
following relations hold X^g = Px„(A), Aj.^ = Px„(A') and L — Px(,(P). This 
diagram corresponds to the natural transformation a : Id — > Px„, where 
Id is the identity functor on GeoNET (and thus on GeoNET(P)), Px„ is 
the previously defined functor, ax, crx' are abitrary components of the natural 
transformation. Analyzing this diagram we can observe, that “Learning on the 
right is cheaper” (for more details we refer to the remark on the “local view” 
of a network in section 4). This fact can be used to reduce the complexity of 
training (learning) in a particular ANN. In the next section we briefly present 
an application where this effect was exploited. 



5 An Industrial Application: Optical Quality Control 

In this section we briefly describe an industrial project carried out by H. Geiger, 
in which the application of our previously sketched mathematical modeling ap- 
proach showed unexpected economic effects. In the project a quality control 
problem in production of tiles had to be solved (cf. [Gei94]). Among others, a 
problem was to detect breaks in the edges of the upper and lower surface of a 
tile, deformation of edges, pores in the upper surface (more than about 3mm di- 
ameter has to be rejected). Uneven distribution of the embedded material must 
be avoided, certain changes in color are unacceptable. Rough spots on the sur- 
face have to be singled out and failures during polishing can even cause breaks 
in the surface. Time constraints had to be taken into account. All the checks 
must be done within a 2 seconds time interval per tile. At least 4 camera images 
(512 * 512 pixels per image) are necessary to check all the edges and surfaces. A 
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main requirement is the flexibility of use. A change of the material of the test 
objects should be possible within less than 1 hour and without changing the 
software. The system should have a minimal robustness concerning disturbances 
(like changes of light, position, speed). Typical for such real world problems is 
the great difficulty to achieve a symbolic, logic and closed form mathematical de- 
scription of the whole scenario. It turned out in that project that a key problem 
was that the criteria to be checked were not amenable to easy implementations 
in numerical algorithms. This is a typical situation where one can successfully 
apply neural network techniques. 

Finally, in the industrial project, the strategy of solution to these practical 
problems is based on a combination of ANN approaches and classical methods. 
The network used is designed as a multilayer network with preprocessing facili- 
ties, feature extraction and classification neurons at the output. As described in 
[Gei94] , direct exploitation of our joint mathematical modeling approach in ANN 
simulation with NeuroTools (exploiting the association step and the commuta- 
tivity of the corresponding diagram) led to considerable reductions in production 
costs. 

6 Concluding Remarks, Prospects 

In this note, our interest concentrated on the net structure of an ANN. The net- 
works we consider are (locally) regularly structured in terms of certain config- 
urations which can be interpreted as geometric configurations. This local struc- 
turing is homogeneously distributed over the entire net, i.e. the same type of 
(geometric) configurations can be found locally in each node. Of future interest 
will be the investigation of the “simplicial structure” of a network in the sense 
of simplex configurations (cf. [Pfa87] ) possibly leading to a corresponding group 
operation on a net. Besides that, other outcomes of interest include a semantic 
modeling approach : interpreting a net structure as a general relational struc- 
ture and working with the category PATH (cf. [Pfa94]). Another idea is to apply 
categorical constructions like limits, colimits (including products, coproducts) 
in order to construct large networks by small components (modules); in such 
processes morphisms play a basic role. 
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Abstract. Functional networks is a powerful and recently introduced 
Artificial Intelligence paradigm which generalizes the standard neural 
networks. In this paper functional networks are used to fit a given set of 
data from a tensor product parametric surface. The performance of this 
method is illustrated for the case of Bezier surfaces. Firstly, we build the 
simplest functional network representing such a surface, and then we use 
it to determine the degree and the coefficients of the bivariate polyno- 
mial surface that fits the given data better. To this aim, we calculate 
the mean and the root mean squared errors for different degrees of the 
approximating polynomial surface, which are used as our criterion of a 
good fitting. In addition, functional networks provide a procedure to de- 
scribe parametric tensor product surfaces in terms of families of chosen 
basis functions. We remark that this new approach is very general and 
can be applied not only to Bezier but also to any other interesting family 
of tensor product surfaces. 



1 Introduction 

1.1 Preliminars 

Computer-Aided Geometric Design (CAGD) is devoted to constructing a precise 
mathematical description of the shape of a real object, and focuses on the effi- 
cient computer representation of its geometry. Its range of applications includes 
areas like publicity, animation, multimedia tools, virtual reality, computer vision, 
robotics, etc. For an introduction to the field, the reader is referred to [8,10,12]. 

The main aspect of CAGD is the study of free-form curves and surfaces. They 
are essential tools (among others) in the automotive, aircraft and shipbuilding 
industries [1]. Roughly speaking, free-form curves and surfaces are parametric 
functions governed by a set of points (called control points) that more or less 
determine the shape of the curve or surface and many of its geometric properties. 

Free-form curves and surfaces have been extensively applied to fit data. In 
general, we are given a set of data and we look for the curve or surface following 
some functional structure and that minimizes the error on the prescribed data. A 
number of different methodologies to solve this problem have been described. The 
one proposed here takes advantage of a recent Artificial Intelligence paradigm 
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which generalizes the standard neural networks: the so-called functional networks 

[ 2 ]. 

In this paper we restrict ourselves to the case of Bezier surfaces. Firstly, we 
obtain the simplest functional network representing such a surface, and then we 
use it to determine the degree and the coefficients of the biparametric polynomial 
surface that fits the given data better. However, our proposal is very general and 
the same scheme can be successfully applied to any other interesting family of 
tensor product surfaces in CAGD. 

The structure of this paper is the following: firstly we introduce some math- 
ematical concepts and definitions. In addition, a set of data to be used later is 
obtained. Then, functional networks are motivated by introducing an example 
of a problem which cannot be well described in terms of neural networks. Sec- 
tion 2 describes the functional networks paradigm. Differences between neural 
and functional networks will also be discussed in this section. Section 3 gives 
a general methodology to work with these networks. The required steps of the 
method will be illustrated by its application to the parametric surfaces problem. 
Some interesting features of the method, like the possibility to determine the 
degree and the coefficients of the approximating surface that fits the given data 
better are also shown in this section. Finally, the paper closes with the main 
conclusions of this work. 



1.2 Some Mathematical Definitions 

A Bezier curve of degree m is given by 

m 

c(s) = ^p,i?r(s) (1) 

i=0 

where {Pi; f = 0, . . . ,m} is a set of {m -\- 1) two- or three-dimensional points 
called control points and B”^{s) are the Bernstein polynomials of degree m, de- 
fined as 



BTis) 







where 




ml 

i\ (jn — z)! 



To make this definition useful practically, we focus on the parameter interval 
[0, 1] (see [8,10]). Note that in this paper vectors are denoted in bold. 



With this notation, a tensor product Bezier surface of degree mx n is given by 

m n 

P(A^) = EEPb^r(s)i3”(f) (2) 

j—0 

where jP^ | z = 0, . . . , to; j = 0, . . . , zz} are also control points and B™{s), Bf{t) 
are the Bernstein polynomials of degrees to and n respectively. Once again, the 
variables s and t are to be valued onto the square [0, 1] x [0, 1]. 
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1.3 Obtaining a Set of Data 

To describe how functional networks work we need some data. In general, they 
come from an unknown surface to be obtained. However, since our primary goal 
is to show the performance of the functional networks in fitting surfaces, we will 
focus on data given from a parametric Bezier surface. Note that this limitation is 
motivated by academic purposes only, and the reader will not find any trouble in 
generalizing our statements to a set of points coming from any unknown surface. 

To this aim, we have selected a set of 121 data {Tpq-, p,q= 1, . . . , 11} in a 
regular 11x11 grid from a Bezier surface. This surface has been generated from 
(2) for the case m = n = 3, with 16 control points which are listed in Table 1. 



{x,y,z) jx,y,z) {x,y,z) {x,y,z) 

(1.1.1) (1,3,3) (1,5,2) (1,7,5) 
(3,1,3) (3,3,6) (3,5,1) (3,7,6) 

(5.1.2) (5,3,1) (5,5,6) (5,7,1) 
(7,1,6) (7,3,5) (7,5,3) (7,7,4) 



Table 1. Control points used to define the parametric tensor product Bezier 
surface. 



The resulting surface is called a bicubic tensor product Bezier surface in 
CAGD. Its final expression is given by: 





/ 1 + 6 s \ 


( x{s,t)\ 


1 + 6 t 


y(s,t) = 


1 + 6s — 9s^ + 8s^ + 6t + 9st — 45s^t+ 


\z{s,t) J 


27sH - 9t^ - 45st^ + 171 sH^ - l20sH‘^+ 




\ 7t^ + 33sf3 - 1355^13 + 99s3^3 y 



In order to check the robustness of the proposed method, the third coordinate of 
the 121 points {{xk,yk, Zk)} was slightly modified by adding a uniform random 
variable U (—0.05, 0.05). Such a random variable plays the role of a measure error 
that usually appears in many realistic situations. 

1.4 Motivating Functional Networks 

Artificial neural networks have been recognized as a powerful tool for learning 
and simulating systems in a great variety of fields (see [7] and [9] for a survey of 
this field). However, not every approximation problem is adequately described 
in terms of a neural network. The following example illustrates this situation: 

Example: Suppose that we look for the most general family of parametric sur- 
faces P(s,t) such that their isoparametric curves (see [8] and [10] for a de- 
scription) s = So and t = to are linear combinations of the sets of functions: 
f(s) = {/o(s),/i(s),...,/m(s)| andf*{t) = {fS(t)J*{t)...,f*(t)} respectively. 
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To be more precise, we look for surfaces P(s,f) such that they satisfy the 
system of functional equations 

n m 

= ^OLj{s)f*{t) = ^/3i(f)/*(s) (4) 

0 i—0 

where the sets of coefficients {a.j{s);j = 0, 1, . . . , n} and {/3j(f); z = 0, 1, . . . , m} 
can he assumed, without loss of generality, as sets of linearly independent func- 
tions. Note that if they are not, we can rewrite equations in (4) in the same form 
but with linearly independent sets. 




Fig. 1. (left) Graphical representation of a functional network for the paramet- 
ric surface of eq. (4); (right) Functional network associated with eq. (5). It is 
equivalent to the functional network on the left. 



This problem admits the graphical representation given in Figure l(left) 
which, at first sight, looks like a neural network. However, the previous de- 
scription in terms of neural networks presents the following problems: 

— Neural functions in neural networks are identical, whereas neural functions 
in our example are different. For instance, we may find product and sum 
operators (indicated in Figure 1 by the symbols ‘x’ and ‘-I-’ respectively). 

~ The neuron outputs of neural networks are different; however, in our scheme, 
some neuron outputs in the example are coincident (this is the case of the 
outputs in Figure l(left) associated with the last layer of neurons leading to 
the value of P(s, t)). 

These and other disadvantages suggest that the neural networks paradigm is 
very restrictive and can be improved in several directions. Recently, a powerful 
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extension of neural networks has been introduced [2] and successfully applied to 
several problems [3,4], without exhibiting the previous shortcomings. Such an 
extension is based on the idea of allowing the activation functions of the neurons 
to be unknown functions from a given family, which will be estimated during 
the learning process. 

Until the present, functional networks has not been extensively applied to 
CAGD. The first example of application of functional networks to CAGD was 
given in [5]. However, only implicit and explicit surfaces were considered there. 

2 Functional Networks 

2.1 Components of a Functional Network 

From Figure l(left) the main components of a functional network become clear: 

1. Several layers of storage units. 

(a) A layer of input units. This first layer contains the input information. In 
this figure, this input layer consists of the units s and t. 

(b) A set of intermediate layers of storage units. They are not neurons but 
units storing intermediate information. This set is optional and allows 
connecting more than one neuron output to the same unit. In Figure 
I (left) there are two intermediate layers of storage units, which are rep- 
resented by small circles in black. 

(c) A layer of output units. This last layer contains the output information. 
In Figure I(left) this output layer reduces to the unit P(s,t). 

2. One or more layers of neurons or computing units. A neuron is 
a computing unit which evaluates a set of input values, coming from the 
previous layer, of input or intermediate units, and gives a set of output values 
to the next layer, of intermediate or output units. Neurons are represented 
by circles with the name of the corresponding neural function inside. For 
example, in Figure l(left), we have three layers of neurons. The first one 
gives outputs of functions with one variable. The second layer exhibits the 
same function for all its neurons, the product operator. Similarly, the last 
layer exhibits the sum operator for its two neurons. 

3. A set of directed links. They connect the input or intermediate layers to 
its adjacent layer of neurons, and neurons of one layer to its adjacent inter- 
mediate layers or to the output layer. Gonnections are represented by arrows, 
indicating the information flow direction. We remark here that information 
flows in only one direction, from the input layer to the output layer. 

All these elements together form the network architecture or topology of the 
functional network, which defines the functional capabilities of the network. For 
example, since units are organized in series of layers, the functional network in 
Figure l(left) is a multilayer network. 
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2.2 Differences between Functional and Neural Networks 

Some of the differences between functional and neural networks were already 
introduced in Section 1.4. In these paragraphs, we discuss these differences and 
the advantages of using functional networks instead of standard neural networks. 

1. In neural networks each artificial neuron receives an input value from the 
input layer or the neurons in the previous layer. Then it computes a scalar 
output y = f (^WikXk) from a linear combination of the received inputs 
xi,X 2 , ■ ■ ■ ,Xn using a set of weights Wik associated with each of the links and 
a given scalar function / (the activation function), which is assumed the same 
for all neurons. That is, each neuron returns an output y = /(X) WikXk) that 
only depends on the value '^WikXk- Therefore, their neural functions have 
only one argument. On the contrary, neural functions in functional networks 
can have several arguments. 

2. In neural networks the neural functions are univariate: neurons can show 
different outputs but all of them represent the same values. In functional 
networks, the neural functions can be multivariate. 

3. In a given functional network the neural functions can be different, while in 
neural networks they are identical. 

4. In neural networks there are weights, which must be learned. These weights 
do not appear in functional networks, where neural functions are learned 
instead. 

5. In neural networks the neuron outputs are different, while in functional net- 
works neuron outputs can he coincident. As we shall see, this fact leads to a 
set of functional equations, which have to be solved. These functional equa- 
tions impose strong constraints leading to a considerable reduction in the 
degrees of freedom of the neural functions. In most cases this implies that 
neural functions can be reduced in dimension or expressed as functions of 
smaller dimensions. 

All these features show that the functional networks exhibit more interesting 
possibilities than standard neural networks. This implies that some problems 
(e.g. the one introduced in Section 1.4) require functional networks instead of 
neural networks for their solution. In the next section, we shall take advantage 
of this fact by solving the problem of the characterization of parametric tensor 
product surfaces. 

3 Working with Functional Networks 

In this section, we describe the functional networks methodology, which is or- 
ganized, for clarity, into eight different steps. These steps are described by their 
application to the parametric surface example previously introduced in Section 1 . 

Step 1 (Statement of the problem): Understanding the problem to be 
solved. This is a crucial step, which has been done in Section 1. 
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Step 2 (Initial topology): Based on the knowledge of the problem, the topol- 
ogy of the initial functional network is selected. Thus, the system of functional 
equations (4) leads to the functional network in Figure l(left). Note that the 
above equations can be obtained from the network by considering the equality 
between the two values associated with the links connected to the output unit. 
We also remark that each of these values can be obtained in terms of the outputs 
of the preceding units by writing the outputs of the neurons as functions of their 
inputs, and so on. 

Step 3 (Simplification): In this step, the initial functional network is simplified 
using functional equations. Given a functional network, an interesting problem 
consists of determining whether or not there exists another functional network 
giving the same output for any given input. This leads to the concept of equiv- 
alent functional networks. Two functional networks are said to be equivalent if 
they have the same input and output units and they give the same output for 
any given input. The practical importance of this concept is that we can define 
equivalent classes of functional networks, that is, sets of equivalent functional 
networks, and then choose the simplest in each class to be used in applications. 

Coming back to the example, it seems that the functions {cxj(s); j = 0,1,..., 
n} and = 0, 1, ... , m} have to be learned. However, the functional equa- 

tions (4) put strong constraints on them. In fact, the general solution of this 
functional equation is given by the following theorem (see reference [11] for de- 
tails): 

Theorem 1. The most general family of parametric surfaces P(s,t) such that 
all their isoparametric curves s = sq and t = to are linear combinations of 
the sets of linearly independent functions: f(s) = {/o(s), /i(s)) ■ • • > /m(s)} and 
f*(^) = {/o /i (^) • ■ = /n(^)} respectively, is of the form 

m n 

= EEPb/*(s)/;w = f(s).p.(f*(i))^ (5) 

j—0 

where (.)^ indicates the transpose of a matrix and Py are elements of an arbi- 
trary matrix P; therefore, P(s,t) is a tensor product surface. 

Two important conclusions can be derived from this theorem: 

1. No other functional forms for P(s,t) satisfy equations (4). So, no other 
neurons can be replaced by neurons (3^,a.j,fi and /*. Therefore, eq. (5) 
provides a characterization of the tensor product surfaces, a pressing question 
in CAGD. 

2. The functional structure of the solution is (5). This equation shows that the 
functional network in Figure 1 (right) is equivalent to the functional network 
in Figure l(left). 

Step 4 (Uniqueness of representation): Here, conditions for the neural 
functions of the simplified functional network must be obtained. For eq. (5), two 
cases must be considered: 
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1. The fi{s) and f*{t) functions are given: Assume that there are two 
matrices P = {P^} and P* = {P*j} such that 

m n m n 

P{s,t) = ^^p,,/,(s)/;(t) = (6) 

i=0 j=0 i=0 j=0 

Solving the uniqueness of representation problem consists of solving equation 
(6). To this aim, we write (6) in the form 

m n 

EE(Pb-n)/*w/;w=o (^) 

j—0 

Since the functions in the set {/i(s) f*(t) \ i = 0,1, . . . ,m ; j = 0,1, ... ,n} 
are linearly independent because the sets {/i(s) |t = 0, 1, . . . , mj and {/j'(t) | 
j = 0, 1, . . . , n} are linearly independent, from (7) we have 



Pij = P*j ; t = 0,l,...,m ; j = 0,l,...,n 

that is, the coefficients Py in (5) are unique. 

2. The fi(s) and f*(t) functions are to be learned: In this case, assume 

that there are two sets of functions and and two 

matrices P and P such that 

m n m n 

p{s,t) = EEPb/*w/;w = EEPb/*w/;w («) 

j—0 i—0 i— 0 

Then we have 



E E - E E Pb/i(s)/; w = « 

j—0 i=0 j—0 



According to Theorem 1 in [6], the solution satisfies 

/ E P*o/*(s) \ 



m 



E P*i/*(s) 




( mo \ 


i—0 




.mo 


m 

E Pm/*(s) 

i—0 


/p^\ 


.mo 




= — f^(s) ; 


— 


m 

E P*o/.(s) 


\ B / 


-fSiO 






-.nio 


E P.i/*(s) 






z=0 




\-fnio) 



\ E Pm/*(s) / 




(9) 






( 10 ) 
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with 

(P 

From (10) we get 

F(s) = Bf^(s) ; {r{t)f = -C{r{t)f, (12) 

Expression (12) gives the relations between both equivalent solutions and 
the degrees of freedom we have. 

However, if we have to learn f(s) and f*(t) we can approximate them as: 

f(s) = </.(s)B ; C, (13) 



B^) — =0 4^P = -B^C 



( 11 ) 



and we get 

P{s,t) = = (P{s).B.P.C^.'iP{tf = (f>{s).p.^^,{tf, (14) 

which is equivalent to (5) but with functions {cf){s),^p{t)} instead of {f(s), 
f*(t)}. Thus, this case reduces to the first one. 

Step 5 (Data collection): This step has been already done in Section 1.3. 

Step 6 (Learning): At this point, the neural functions are estimated (learned), 
by using some minimization method. In functional networks, this learning pro- 
cess consists of obtaining the neural functions based on a set of data D = 
{{Ii,Oi)\i = l,...,n} given in the previous step, where li and Oi are the i- 
th inputs and outputs, respectively, and n is the sample size. 

Usually, the learning process is based on minimizing the sum of squared errors 
of the actual and the observed outputs for the given inputs 

n 

Q = E (^* - ^(^*))" > (15) 

i=l 

where F is the compound function given the outputs, as a function of the inputs, 
for the given network topology. One learning alternative consists of approximat- 
ing each neural function fi by a linear combination of functions in a given family 
{4>ii, . . . , Thus, the approximated neural function /i(x) becomes 

rrii 

/i(x) = E®b^b(x), ( 16 ) 

i=i 

where x are the inputs associated with the i-th neuron. Note that the above 
function F includes all the neural functions in the network, and therefore it 
depends only on the coefficients oy, which are estimated in the learning process. 

In the case of our example, the problem of learning the above functional 
network reduces to estimate the neuron functions x{s,t), y{s,t) and z{s,t) from 
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a given sequence of triplets {(a;fe, yk, Zk), k = 1, • . . , 121} which depend on s and 
t so that x{sk,tk) = Xk and so on. To this aim we build the sum of squared 
errors function: 

121 / 

Oik — 

where, in the present example, we should consider an error function for each 
variable x, y and z. However, since we have just introduced a measure error into 
the z coordinate, eq. (17) must be interpreted as an equation for a = z only. 
The optimum value is obtained when 




EE 



(17) 



i=i i=l 



dQa 

2dars 



I J 



E - E E aij4>i(.Sk)^j{tk) <l)r{sk)tps{tk) = 0 
fc=l \ i=l j=l J 

; s=l,...,J. 



(18) 



To fit the 121 data points of our example, we have used monomials in s and 
t variables for the functions {(pi{s) = s*|i = 0,1,...,/} and = P\j = 

0, 1, . . . , J} in (17). Of course, every different choice for I and J yields to the 
corresponding system (18), which must be solved. In particular, as the data 
points come from a bicubic parametric surface, we have taken values for / and J 
from 2 to 4. Solving the system (18) for all of these cases, we always obtain the 
values 1 + 6 s and 1 + 6 t for x{s, t) and y{s, t) respectively (as expected, because 
they are not affected by any perturbation). But, of course, the corresponding 
approximation for z{s,t) depends on the / and J values. 



Step 7 (Model validation): At this step, a test for quality and/or the cross 
validation of the model is performed. Checking the obtained error is important 
to see whether or not the selected family of approximating functions is adequate. 
A cross validation of the model is also convenient. 



To cross validate the model: 

1. we have calculated the mean, the maximum and the root mean squared 
(RMS) errors, for the 121 training data points. The obtained results for 
the different values of / and J are reported in Table 2. As the reader can 
appreciate, the best choice corresponds to / = J = 3, for which the mean and 
the RMS errors are 0.0055 and 0.0001 respectively. Since errors are small, 
the selected approximating third degree bivariate polynomial was considered 
adequate. 

2. we have also used the fitted model to predict a new set of 1681 testing data 
points, and calculated the mean, the maximum and the root mean squared 
(RMS) errors, obtaining the results shown in Table 3. Once more, the small- 
est error is obtained for / = J = 3. A comparison between mean and RMS 
error values for the training and testing data shows that, for this choice, 
they are comparable. Thus, we can conclude that no overfitting occurs. Note 
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TRAINING POINTS 





J=2 


J=3 


II 




0.1641 


10.8169 


0.1399 


1=2 


0.6731 


27.9846 


0.5128 




0.0194 


1.2192 


0.0163 




10.8169 


0.0055 


0.0072 


1=3 


27.9846 


0.0405 


0.0443 




1.2192 


0.0001 


0.0009 




5.9127 


65.8192 


0.0079 


1=4 


32.4473 


335.8790 


0.0503 




0.8665 


9.5675 


0.0009 



Table 2. Mean, maximum and root mean squared errors of the 121 training 
points for different values of I and J. 



TESTING POINTS 





J=2 


J=3 


II 




0.1443 


10.5287 


0.1205 


1=2 


0.6731 


27.9846 


0.5128 




0.0044 


0.3131 


0.0036 




1.6696 


0.0055 


0.0065 


1=3 


10.748 


0.0405 


0.04516 




0.0580 


0.0001 


0.0002 




5.6439 


62.3208 


0.0068 


1=4 


32.4474 


335.879 


0.0515 




0.2171 


2.3869 


0.0002 



Table 3. Mean, maximum and root mean squared errors of the 1681 testing 
points for different values of I and J. 



that a variance for the training data significantly smaller than the variance 
for the testing data is a clear indication of overfitting. This does not occur 
here. 

As a conclusion, we have obtained J = J = 3 as the best choice for fitting 
the data points. In this case, the approximate bivariate polynomial for z{s,t) is 
given by: 



z{s, t) = 0.995522 + 6.07021 s - 9.26584 + 8.24071 + 5.96881 1+ 

10.0693 s t - 47.0224 t + 27.6718 t - 8.86878 
48.6516 st^ + 178.817 _ 123.711 + 6.89967 t^+ 

35.7064 st^- 141.16 + 102.271 



where 6-digit precision has been used for calculating all the coefficients. Com- 
parison of these results with (3) indicates that we have obtained a good approx- 
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imation to the true surface. This fact is illustrated in Figure 2, where the fitted 
surface and the data points are shown. Figure 3(left) shows the errors associated 
with the data points on the grid for such a surface. 




Fig. 2. Fitted surface and used data points. 




Fig. 3. (left) Error surface (error measured at data points Tpq on the grid); 
(right) Quality function values (see eq. 20) for different choices of / and J. 



In spite of the previous results, it could be argued that only the best choice 
for I and J into the set {2,3,4} is obtained, and perhaps higher degrees might 
lead to a better fitting of the data points. The discussion of higher degrees is not 
included here because of limitations of space. It is enough to say that it is not 
the case; on the contrary, there is an optimal value for the degrees I and J and 
smaller or higher values just increase both the mean and the RMS errors. As an 
illustration, we have calculated the values of a quality function, QF, given by: 
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QF = 100 




RMS{TrP) - RMS{TeP) 
RMS{TrP) 



(20) 



where TrP and TeP indicate the training and the testing points, respectively, as 
a function of the degrees / and J . This function QF returns a number p on the 
interval [0, 100], which can be interpreted as a measure of the “goodness” of the 
approximation of the original Bezier surface (given by eq. (3)) by the resulting 
bivariate polynomial. Thus, the closer to 100 this value is, the more similar these 
surfaces are. 

Figure 3(right) shows the obtained results. Clearly, the choice J = J = 3 is 
indeed associated in practice with a value of 100. This implies that the surface 
obtained (19) reproduces the original one, with a very small errors, as shown in 
Figure 3(left). 

From Figure 3(right) it becomes clear that this quality function is particu- 
larly appropriate for determining the optimal value for / and J. Different values 
for I or J lead to a significantly larger error. So, this method can be used to 
approximate a Bezier surface (a parametric tensor product surface, in general) 
by polynomials, an important issue in CAGD. 

Step 8 (Use of the model): Once the model has been satisfactorily validated, 
it is ready to be used in predicting new points on the surface. 



4 Conclusions and Recommendations 

In this paper, functional networks, a powerful extension of neural networks, are 
applied to fit a given set of data from a tensor product surface, a very important 
geometric entity in CAGD. As an example, we consider here a Bezier surface, 
which is approximated by using a monomial basis family. The method proposed 
in this paper is useful not only to get the coefficients of the approximating 
polynomial surface but also to determine the optimal degree of such a surface, 
in the sense that it minimizes the error measured at data points. From this point 
of view, functional networks provide a good tool to describe Bezier surfaces by 
using the monomial basis. Futhermore, since we are free to choose the basis 
family for the approximation, our assertions can be generalized to the cases in 
which the Bezier surface is approximated by any other family of functions, such 
as trigonometric functions, Hermite polynomials, Laguerre polynomials, etc. 

Finally, we remark that the scheme presented here is very general. In fact, 
we can apply these ideas to approximate any other interesting family of tensor 
product surfaces in CAGD, such as B-splines. The problem is currently under 
investigation and the obtained results will be reported elsewhere. 
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Abstract. Computer-Aided Geometric Design (CAGD) is one of the 
most important fields in Computer Graphics. Usually, CAGD is handled 
in traditional programming languages, such as Fortran, Pascal or C. By 
contrast, this paper supports the idea that Symbolic Computation Sys- 
tems (SCS) should be used instead. To this aim, the paper shows how 
some mathematical expressions for Bezier curves and surfaces can be 
easily translated to the Mathematica programming language. Then, they 
are used to prove symbolically some mathematical properties related to 
these geometric entities. 



The term Computer-Aided Geometric Design (CAGD) was invented by R. E. 
Barnhill and R.F. Riesenfeld in 1974 to describe the more mathematical aspects 
of Computer-Aided Design (CAD). Since then, CAGD becomes one of the most 
important fields in Computer Graphics [1,2]. Usually, CAGD is handled in tra- 
ditional programming languages, such as Fortran, Pascal or C. However, the 
appearance of the Symbolic Computation Systems (SCS), such as Mathematica 
or Maple, opens new and exciting possibilities. 

Recently, a Mathematica [7] package to deal with Bezier curves and surfaces 
(see [5] for a description), one of the most important topics in CAGD, has been 
implemented by the author [3,4]. In this paper, this package is used to show how 
symbolic computation can be successfully applied to CAGD. 

The first task to be done in this process is to represent the geometric entities 
by computer. This representation cannot be chosen at random; on the contrary, 
it must satisfy some conditions: it should be clear, unambiguous and easy to ma- 
nipulate. In this sense, one of the most remarkable SCS capabilities is the ability 
to represent expressions in a compact, easy and intuitive way. Table 1 shows 
how some mathematical expressions can be easily translated to the Mathemat- 
ica programming language. As the reader can appreciate, the powerful symbolic 
Mathematica capabilities allow shorter, simpler and more elegant codes, which 
simply reproduce the mathematical structure of the equation to implement. In 
most cases, these capabilities include pattern recognition and object-oriented 
programming features [6] . 

For instance, the patterns weights_List and weights_?MatrixQ identify vec- 
tors and matrices respectively, the pattern pts . } is applied 
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Mathematical expression 


Translation to the Mathematica language 


Bernstein polynomial 


Bernstein [i_ ,n_,t_] : = 

Binomial [n, i] i (l-t)~(n-i) 


B(t)=^P,H"(t) 

i=0 

Bezier curve 


BezierCurve [pts 1 {{ = 

Module [{n=Length[pts] -1} , 

Simplify [ 

Table [Bernstein[i ,n, t] ,{i,0,n}] .pts] ] 

] 


t=0 j=0 

Bezier surface 


BezierSurf ace [pts , {s-,t }] : = 

Module [{m=Length[pts] -1 , 

n=Length [First [pts] ] -1 ,U, V} , 

{U, V}=MapThread [Table [Bernstein [i , #1 , #2] , 
{i,0,#l}]& {{m,n} , {s ,t}}] ; 

Plus @0 (U.pts*V) //Simplify] 


f: 

B(f) = 

i=0 

Rational Bezier curve 


RationalBezierCurve [pts : {{_ , . . } | 

. } , weight s.List ,t_] : = 
Module [{n=Length[pts] -1 ,lisfun} , 

If [Length [pts] ==Length [weights] , 
lisfun=Table [Bernstein [i ,n,t] , 

{i,0,n}] ; 

Simplify [(Plus @@ 

(pts*weights*lisfun) )/ 
(lisfun. weights) 

], 

Message [RationalBezierCurve : :badnum] ] ] 


Si" fi *=“1=0 


RationalBezierSurf ace [pts : {{{ 

weights_?MatrixQ,{s_,t_}] : = 
Module [{m=Length[pts] -1 , 

n=Length [First [pts] ] -1 ,U, V} , 

If [Take [Dimensions [pts] , 2] == 

Take [Dimensions [weights] ,2] , 

{U , V}=MapThread [ 

Table [Bernstein [i , #1 , #2] , 
{i,0,#l} 

]&,{{m,n},{s,t}}] ; 

Plus @@ (U. (pts*weights)*V)/ 

(Plus (U.weights*V)) 

//Simplify, 

Message [ 

RationalBezierSurf ace : : badnum] ] ] 


rn ti 

E E 

i=Q j=Q 

Rational Bezier surface 



Table 1. Examples of translation of some mathematical entities and expres- 
sions used in CAGD (left column) to their equivalent symbolic Mathematica 
commands (right column). Vectors are denoted in bold. 
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to represent an arbitrary number of either two- or three-dimensional control 
points and the pattern pts . } . . } represents a matrix (with an arbi- 
trary number of rows and columns) of three-dimensional points. 



Mathematical expression 


How to prove it in Mathematica 


n 

i=0 

Partition of unity 


PowerExpand [ 

FullSimplify [ 

Sum[Bernstein[i,n,t] ,{i,0,n}] 

] 

] 


Symmetry 


rule=Binomial [n_,n_-i_] -> Binomial [n, i] 
Bernstein [n-i ,n, 1-t] /. rule 


n 

y-B-{t)=t 

n 

i=0 

Linear precision for 
Bernstein polynomials 


PowerExpand [ 

FullSimplify [ 

Sum [ (i/n) *Bernstein [i,n,t] ,{i,0,n}] 

] 

] 


n 

i=0 

= (1 — ^)P 

Linear precision for 
Bezier curves 


Collect [ 

PowerExpand [ 

FullSimplify [ 

Sum[( (1-i/n) tp+(i/n) tq) *Bernstein [i,n,t] , 
{i,0,n}] 

] 

], 

p] 



Table 2. Examples of some mathematical properties (left column) and how they 
can be proved by using Mathematica (right column). 



In addition, all the expressions are manipulated in a symbolic way avoiding 
the spurious behavior and round-off errors obtained when numerical methods 
are applied. For example, given a set of three-dimensional control points and 
their corresponding weights: 

In[l] := pts={{{0,0,0},{2,0,3},{4,0,0}}, 

{{0,2,0}, {2, 2, 3}, {4, 2, 2}}}; 
weights={{! , 4 , 5} , {1 , 2 , 3» ; 

the command 



In[2] := RationalBezierSurface [pts, weights, {u,v}] 
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Out[2]:= 



4r>(4 — 2u + v) 



2(u + 2uv) 



(1 -I- (6 — 4 rt) V + 2 (—1 + u) ! + {(!> — A u)v + 2 (—1 -|- u) v"^' 
6t;(4 — 4r; -I- u{—2 + 3n)) 



1-1- (6 — 4 m) m-I-2 (—1 -I- u) 



returns the mathematical expression of the corresponding Bezier surface. Note 
that this output corresponds to the symbolic equation of a parametric rational 
function of degrees (1, 2) in directions (rt, v), as the input is given by 2 x 3 control 
points. 

We can take advantage of the symbolic results to evaluate the curves or sur- 
faces with infinite precision, or determine some of their mathematical properties. 
Table 2 shows some properties of Bernstein polynomials and Bezier curves and 
how they can be proved by using Mathematica. 

Furthermore, the SCS graphical capabilities can also be applied to visualize 
the geometry of the objects under analysis. Figure 1 shows a Bezier curve (left) 
and a surface (right) associated with two different sets of control points. 





Fig. 1. Examples of: (left) a Bezier curve; (right) a Bezier surface. 
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Abstract. A rewriting based method to design circuits on FPLA elec- 
tronic devices is presented. It is an improvement of our previous work. 
In comparison with this latter, the number of boolean vectors generated 
during the design process is reduced. This is done thanks to new forms 
of rewriting rules denoting new interesting properties on boolean vec- 
tors, associated to boolean products. Only boolean products which are 
implicants of the circuit to design are computed. Thus, this new design 
process is more efficient than the previous one. 



1 Introduction 

FPLA (Field Programmable Logic Array) devices are the core of complex cir- 
cuits as random logic circuits, interface logic, and other applications that require 
decoding of device inputs. The aim of this work is to design logical circuits for 
such kind of devices. 

It is the first rewriting based method dealing with designing circuits on FPLA 
chips [1]. It is a correct and complete method, in the sense that, if solutions of 
the design exist, they will be deduced by our method within a finite period of 
time. And if no solution exists, then within a finite period of time, a signal 
of failure is also displayed. Let us note that theoretically, neither simulation 
nor validation of the computed solutions is needed. Thanks to the formalism of 
constraint we used, all the properties to be satisfied by the solution are specified 
in the constraints of the rewrite rules, and they have to be considered in all the 
steps of the process. Such properties could be conditions on the layout of the 
logical gates in the device, for example. This approach has been implemented in 
our prototype CDR (Circuit Design by Rewriting). 

New properties of boolean vectors representing boolean products are stated. 
To the best of our knowledge, we are the first to present all those properties. 
They will be used to redefine our new system of constrained rewriting rules. 



2 Boolean Vectors 

We deal with basic notions on boolean algebra [4, 9, 8, 5]. The novelty of this 
approach consists of handling the boolean functions in the sum of products 
normal form by a vectorial representation, unlike the classical ones where an 
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algebraic representations is used. With this representation, we prove interesting 
new properties that will be used to perform the design. It is independent of any 
design method, which can be used to improve the classical ones. 

A boolean vector if = ,v„) is any n-uple of values all in the 

set {0, 1}. Let E" be the set of all boolean vectors of size 2". For example, 
1^2 ={(0000), (0001), (0010), (0011), (0100), (0101), (0110), (0111), (1000), 
(1001), (1010), (1011), (1100), (1101), (1110), (1111)}. n is the dimension of a 
vector of E”, and 2” its size. Let us now define the following notions in order 
to introduce these new properties : 



— the split operation : Vn > 2 



Split : 



r 1^" ^ 1^2 X 1^2 ^ X 1^2 

\ V 1 -^ {{vi,V2,V3,V4), ■■■ , (^’2"-3,^'2"-2,^'2"-l,^'2")) 



~ Let if = (ui, . . . , U 2 ") and if = {ui , . . . , U 2 «) be two vectors in E ” : 



if ^ if iff Vz G [1 ... 2"], if Ui = 0 then Vi = 0. 



Let us note that in the literature [8, 5], ^ is denoted implication. When 
if ^ if, we say that if implies if, we say also that if is an implicant of 
if. 

In the same way, we note all the vectors of 1^" corresponding to products. 
We call them product vectors of dimension n. In order to distinguish between the 
two notions : vectorial-syntactic, we represent a syntactic form with a pattern p 
without arrow, and its vectorial value by . 



2.1 Recursive Properties on Products 

The following theorem is an interesting result on the boolean algebra in the 
sense that it is independent of any design method. It makes up the basis of our 
method. 

Theorem 1. Let &e if G such that Split{'jf) = (T) i,- • • , ~6^fe). 

1- Sk, if 1) ( 0000 ) and 1) j yf ( 0000 ) then 1) i = 1) j . 

2. Wi < k, 1) i € 1?2. 



3 The Rewriting Approach 

We use the paradigm of conditional rewriting[3] to perform the circuit design. 
The formalism of constraints used here is also the usual one. For more details 
see [6], and [7]. All the notations used here are the same as in our previous work 
[2] . A constrained conditional rule is noted C A L s = f in which C is a list 
of constraints to be solved by using algorithms in the predefined algebras, or 
by unification, and L is a list of equations to be solved by rewriting techniques. 
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When the L part does not exist then we call it an equation. All the inference 
rules of the maximal-unit-strategy described in [ 2 ] are included in our system. 
The main inference rules we use are deduction of new rules by Conditional- 
narrowing, and simplifying conditional rules and inequations. Let us point out 
that this inference rule works between an inequation (a pure negative conditional 
rule) and an equation as well. As usual in refutation based methods, we infer in 
our system until the empty clause is generated. The proof of this empty clause 
yields the design of the circuit. 



4 The Design Specification 

The specification of the circuit is divided into three main parts. The axioms 
represent the inputs, called also equations. All the outputs are represented by 
a pure negative clause, and finally five conditional rules with the corresponding 
constraints specify the design of the FPLA and the chip specification. As usual 
in refutation based methods, CDR infers new clauses until the empty clause is 
generated. The proof of this empty clause yields the design of the circuit. 

All the axioms are ground equations of the form P%{out{'lt), 2 ) = tt, where 
1 / is a product vector of dimension 2 , the second argument of P$. 

There are 5 conditional rules specifying the design of the FPLA device. Three 
of those rules does deduce product vectors, and the remaining ones deduce the 
output sums. For example the first conditional product rule is as follows: 
[ryflAr<n — 2 ]A P${out{lif), 2 ) = tt A P${out{t), r) = tt 

P${out{ab{Hf , t)), r -\- 2 ) = tt 

This rule exploits the theorem 1 in order to deduce recursively product vec- 
tors of upper dimension. The operator ab keeps the resulting product vector in 
a compact way without expanding it in its vectorial representation. It allows to 
reduce to memory consuming. 

The following rule performs a sum between a product and an already deduced 
sum. This deduction is performed only through two conditions as shown in the 
constraint part. 

['af = simp{t, n) Alf ^ A x ^ y] A P%{out{t) , n) = tt A S%{out{lf)) = tt 

S${out{or{Hf , it))) = tt 

The constraint Hf if forbids redundant sums. For example, if xfs are 
boolean variables, then the sum between X1X2X3 and a;ia;2 -I- a:2a:3 are not to be 
performed, because X1X2X3 ^ xiX2- The second constraint x ^ y means that the 
syntactic representation of x must be syntactically smaller than y with respect 
to an arbitrary lexicographic order <, defined on the algebraic representation 
of products and which is extended over the sums. The vectors and if take 
part of the predefined algebra, thus all the information corresponding to them 
are wired like integers and booleans. They are in the low hierarchy level [ 2 ] . For 
example, X\X2 < X2X3 x[x3 while X3 ft X2X3 x[x3. This constraint allows 
us to reduce the process in the search space avoiding the deduction of the same 
sum (i.e. the same semantic value) though syntactically different. 
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Finally, the outputs is represented by an inequation. All these outputs are 
the subterms of the variadic term A$(. . . The inequation is as follows : 
A%{S${out{lf i)), S${out{lf k)) = tt 

. . . ,U) ^ tt 

where , if k are the column vectors of the truth table. Once all the 

subterms S'$(. . . ) in A$(. . . ) are reduced to tt, the second standard rewrite rule 
reduces the negative clause to the nil one, thus we get the refutation. 

5 Conclusion 

In comparison with our previous work[l], thanks to those new properties com- 
bined with a goal-oriented deletion criteria, the number of clauses generated is 
tremendously reduced. This second version of CDR is more efficient than the 
first one. All the product vectors are stocked in a particular data structure as 
the BDDs[?]. The preliminary statistics show that the memory consuming is re- 
duced to 30% with respect to the first version of CDR. Moreover, thanks to the 
constraints introduced in the product and sum rules, as presented above, only 
implicants of the outputs are deduced. The number of these deduced products is 
divided in some cases by half with respect to the previous version. Unfortunately, 
this saving is performed only for the product vectors, due to the main property 
stated in theorem 1. We think that additional improvements of our approach are 
still possible, especially for vectors sum deduction. 
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Abstract. The concept of locally effective objects was introduced by 
Sergeraert in the field of Effective Algebraic Topology, where this tool 
was used to represent potentially infinite data structures. This notion, 
borrowed from symbolic computation, was later used to produce, in an 
innovative way, code implementing the well-known search algorithms in 
Artificial Intelligence. In this paper, we show how these implementa- 
tions can be appropriately reinterpreted and specified, by using some 
recent advances in the algebraic specification setting. As a by-product, 
the concept of locally effective graphs provides a framework in which the 
Production Systems and the State Space programming metaphors can be 
formally integrated. 



In his very general proposal to deal with the problem of computability in Alge- 
braic Topology, Sergeraert introduced the notion of locally effective objects [7]. 
This concept allows the programmer to work with potentially infinite data struc- 
tures. This idea was put into practice by Sergeraert and the author to develop 
a Common Lisp symbolic computation system called EAT (Effective Algebraic 
Topology). EAT computes homology groups of iterated loop spaces [8] leading 
to results which had not previously been calculated (either by hand or by com- 
puter). 

The relevance of locally effective matters was subsequently explored by the 
author and coworkers, following two different lines. In the first one, algebraic 
specification tools were used to analyze the mathematical properties of locally 
effective objects in a Category Theory setting (see [4]). In the second one, the 
notion of locally effective graphs was used to develop Common Lisp programs 
which implement, in an innovative and very generic way, the well-known algo- 
rithms for searching in state-spaces (see [6]). 

This work stems from the confluence of these two lines. The results relating 
to algebraic specifications obtained in [4] are extended to the case of locally 
effective graphs. These theoretical results lead to some interesting consequences 
for interpreting the Common Lisp code developed in [6]. Our approach shows 
how some high-level mathematical tools (such as Category Theory) produce 

* Partially supported by DOES, project PB98-1621-C02-01 
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important (and natural) consequences in the design of very concrete software 
systems (such as, in our case, generic search procedures in Artificial Intelligence). 

The standard perspective in algebraic specifications (that is to say, the ini- 
tial semantics approach) is not convenient for studying locally effective objects. 
(See [5], for example, for elementary definitions of algebraic specifications.) The 
suitable context in which locally effective matters can be dealt with is a recently 
discovered setting known as hidden specifications [3]. As the signature leGraph 
for locally effective graphs, we have chosen for the visible part the usual signa- 
ture for (effective) lists (with main sort 1st) and for the hidden part (with grp 
as the only hidden sort): 



vrt — eql 


grp 


vrt 


vrt - 


-> bool 


adj — 1st 


grp 


vrt 


- 


-> 1st 


vrt — goal 


grp 


vrt 


- 


bool 


heuristics 


grp 


vrt 


- 


nat 


edge — cost 


grp 


vrt 


vrt - 


nat 



The first two operations encode the graph itself. The relevant information on 
the vertices set is the equality test vrt-eql, while the knowledge of the edges set 
is represented by adj-lst in the form of adjacency lists. The last three operations 
store information for the search process: vrt-goal is the termination test, heuris- 
tics estimates the relevance of each vertex and edge-cost is intended to evaluate 
the cost of each edge. 

Let us briefly explain why Sergeraert called this kind of data structures locally 
effective objects. According to Sergeraert ’s terminology [7], an effective object is 
the representation in computer memory of a usual finite datum, such as a list 
or a graph. For instance, to deal with an effective graph we should add to the 
previous signature a complete set of construction operations (to construct the 
empty graph, to adjoin a vertex or an edge, and so on). Then, standard alge- 
braic specifications techniques [5], as initial semantics, are enough to formally 
study these data. Even in the absence of constructors, we can manage the same 
family of data. For instance, in the example, this can be achieved by adjoining 
an operation vrts : grp —>■ 1st, collecting the vertices set of the graph. The initial 
semantics of the new signature is void, but we are able to reconstruct from the 
corresponding data structures any of the graphs belonging to the initial object. 
Thus, they can also be called effective graphs. Obviously, infinite graphs cannot 
be specified by any of the two last signatures. Let us go back to the signature 
leGraph. Obviously, the signature does not allow to construct instances of le- 
Graph. However, this signature can be implemented (see [4]) and we are able to 
work with leGraphs in a computer. For instance, the (infinite) graph where each 
natural number has as adjacency list its set of (proper) divisors can be directly 
implemented. Evidently, such graphs can be used in an algorithmic way (in the 
example, an algorithm for computing the gcd of two numbers, without using 
any arithmetical knowledge, can be easily designed), but the kind of information 
which is accessible from them is very different and much poorer than that of 
effective objects. Only local information is available. If a vertex is given, one can 
determine whether it is a goal. If two vertices are given, one can ask whether 
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one is a (direct) descendant of the other. But no global information is available. 
For instance, it is undecidable whether the graph is connected or even whether 
the graph is empty. To understand the meaning of this last claim, let us remark 
that the example covers an infinite graph (thus, it is trivially non-empty), but, 
in fact, the main characteristic of locally effective objects is that any informa- 
tion about cardinality is missing. In general, locally effective objects will be used 
when the underlying space is infinite but also when it is so huge that no explicit 
storing would be sensible. This situation appears in a natural way in the field of 
Symbolic Computation in Algebraic Topology, but also, as it is well-known, in 
the field of Artificial Intelligence {combinatorial explosion). 

But before talking about applications, let us go back to the formal analysis 
of hidden signatures [3] such as IcGraph. Let us denote by H Alg{lcGraph) the 
hidden category of leGrap/i-algebras (see [3]). Then a functional description of a 
final object in H Alg{leGraph) is given in the following result. 



Theorem 1. There exists a final object I in HAlg{leGraph) in which: 

I{grp) . — {[fvrt — eqh fadj — lst^ fvrt—goaly f heuristics ■> fedge—cost\f: 

where each element in I {grp) is a tuple of functions so that: fvrt-eqi ■ Dvrt x 

Glyrt ^ hDtiQQlj f — i Dlslj f y^l — gQdl . D > D IjqqI f fi^.ij^^istiCS ’ 

Dvrt ^ IN, f edge— cost • X Dyyi ^ IN. 

The definition of the operations in / is the natural one. For instance, 

I{vrt - eql){[fyrt-eqU ■ ■ •], ^1,^2) := fvrt-eql{vi,V2). 



The object described in the last theorem is very close to certain formalisms for 
modeling object-oriented programming, based on Cardelli’s metaphor of “objects 
as records of functions” (see [1]) . This relationship is quite natural because one of 
the explicit goals of hidden specifications is also to find formal models for object- 
oriented concepts (see [3]). This result can also be interpreted as a specially easy 
presentation of a more general result in [3] on the existence of final objects in 
hidden categories. (The careful reader will notice that the object introduced in 
the previous theorem is not isomorphic to the object defined in [3]. This is not 
surprising because, in fact, the object of [3] is not final. Nevertheless, a minor 
modification of the construction in [3] gives the right object which is, obviously, 
isomorphic to the functional version presented here.) 

From the programming point of view, this final object can be directly im- 
plemented, if functional programming is available. This has been the case in the 
symbolic computation systems for Algebraic Topology [8] which have been devel- 
oped in Common Lisp. This same direct implementation as a tuple of functions 
has also been applied in Artificial Intelligence [6]. In [6] the final object of the 
theorem is stored in a Common Lisp record (struct) similar to: 



(def struct legraph vrt-eql adj-lst vrt-goal ...) 
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By using also functional programming, the unique morphism F which exists 
from one object of H Alg^ (leGraph) to I is computable: this is deduced from 
the exponential map (see [4]). For instance, if J is an object in F[ Alg^ {leGraph) , 
then given x G J{grp) the tuple F{x) = [fvrt-eqi, ■ • ■] G I (grp) is defined by: 

fvrt-eqi{vi,V2) := J {vrt - eql){x, Vi , V2) , 

and so on. In addition, the object J is behaviourally equivalent (see [3], [4]) to the 
sub-object F{J) of /. Roughly speaking, this implies that the code associated to 
J can be replaced by F{J) without changing the meaning of the client programs. 

These theoretical results casted new light on the work in [6], where we were 
able to directly reuse Forbus-de Kleer’s programs [2] without worrying about 
specific examples or application fields. We can now explain how the program 
called problem->legraph in [6], which transforms a problem (that is to say, the 
data structure used by Forbus-de Kleer to encapsulate the information about 
state-space problems) into a legraph (the Common Lisp struct evoked previ- 
ously) is, in fact, the composite of two more elementary functions: one which 
deals with a problem as an implementation of the signature leGraph and the 
other (universal) function translating this implementation into the final object 
I of Theorem 1. 

Thus, the conclusion of this analysis is that any production (or ruled-based) 
system is a (representation of a) locally effective graph and therefore that, thanks 
to the final property of I, the generic search procedures of [6] can be applied 
to it. From this point of view, any production system interpreter or rule-based 
inference engine (based on pattern-matching, see [2], or on some kind of unifi- 
cation algorithm) can be considered, in particular, as an implementation of the 
signature leGraph, achieving in this way the formal integration of the Production 
Systems and State-Space programming metaphors. 
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Abstract. The paper describes a general interaction algorithm for coordinating 
multi-agent plans. Triggered by a communication and negotiation protocol the 
coordination framework reconciles situations with negative interferences as 
well as it handles positive opportunities for mutual benefits. Coordination is 
understood then as a mechanism to reconcile plans evaluating interactions 
among agents. The process is a dynamic representation of the environment 
where the structure of tasks goes from being a set of uncoordinated plans to be 
a set of coordinated plans. 



1 Introduction 

Multi-Agent planning is the process of generating a plan among multiple agents where 
agents actions and potential interactions are previously specified. In such a plan 
agents reason about the potential consequences of their actions and about the particu- 
lar order in which these actions are to be executed. In this way the planner is able to 
detect and control confiictive interactions induced by incompatible states or by an 
incompatible resource usage, as well as positive interactions. Coordination algorithms 
are required to manage agents behaviour as follows. On one hand it was a way of 
allocating particular tasks to particular agents sharing a common goal [5], and on the 
other hand it was a medium to achieve better coordination by aligning behaviour of 
agents towards different goals, with an explicit division of labour [11]. Tools and 
methodologies in the design of agent communities have been traditionally based on 
such a dichotomy. Although both paradigms are closely related both research lines 
have considered exclusive algorithms ad hoc. Different sorts of coordination mecha- 
nisms were required in each case depending of pre-specified structural features. 

This paper relaxes the strength of such a dichotomy providing a coordination algo- 
rithm to manage interaction processes in different sorts of agent societies. In our do- 
main agents have to share the same environment and the same set of physical re- 
sources, so they have to plan taking into account the potential set of interactions in 
such a domain. Triggered by a communication and negotiation protocol the coordina- 
tion framework we propose reconciles situations with negative interferences as well as 
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it handles positive opportunities for mutual benefits. The algorithm itself is flexible 
enough to keep open the choice of introducing in further studies new ways of interac- 
tion. It has as one of its main advantages the capability of being adequate to inherently 
cooperative domains [6,13] as well as self-interested agent domains [17]. This is an 
interesting feature which allows us to work in different environments with the same 
general and suitable framework. 



2 Interactions Handling 

The study of plan relationships among autonomous agents is a crucial topic in multi- 
agent systems research. Autonomous agents make plans with the intention a priori of 
avoiding conflicts. Nevertheless they might face situations where the best way of ful- 
filling their own purposes is to interact with the rest of the agents in the community 
[1]. For instance, some relationships fit into situations where tasks are too long, or too 
difficult to be carried out by a single agent [19,20]. In such a case the affected agent 
can ask for help to share the task' assuming some kind benevolence assumption in the 
community. Some other non-necessarily cooperative relationships relax this of be- 
nevolence property to allow situations where the interaction is motivated by mutual 
benefits [17], or the help relationship is just replaced by a favour relationship [14]. In 
such a case agents are not motivated by any natural benevolent impulse^. 

Agents could also face situations which might prevent one or both of the plans 
from being executed as intended. The detection of negative relations is crucial for a 
successful plan execution. The negative interactions we take as reference are situations 
where agents need the same resource at the same time or situations where agents have 
incompatible strategies in achieving their plans [4]. Temporal relations are then asso- 
ciated with actions to ascertain whether a potential resource conflict exists. For in- 
stance, access to a non-consumable resource means no conflict if the time periods for 
utilizing such a resource do not overlap. This is not so if the resource is consumable. 
Temporal reasoning is also involved in solving conflicts when two actions require 
exclusive states to exist. For instance, domain-specific heuristics are needed to prevent 
two agents from occupying the same conference room at the same time for giving 
different talks. This means that pre-specified axioms are needed to state which actions 
should not occur simultaneously. Temporal reasoning is important to find out the 



' Task-sharing is a form of cooperation in which agents assist each other by sharing the load in 
the problem solving. Task-sharing processes have been recently studied also for self- 
interested agents in contract net protocols [18]. 

^ Quantitative arguments are then provided in self-interested domains to evaluate the global 
benefits of exploiting a favour relationship [2]. Even though doing a favour would mean ad- 
ditional costs for the agent who is required to do the favour, agents should reason about the 
entire set of plan relationships. Doing a favour might, for instance, imply personal benefits in 
terms of reducing the costs of dealing with some other current potential conflict. This is usu- 
ally understood as an implicit utility transfer. Utility transfers forms the basis for agreements 
in these sorts of domains, and they are used as a negotiation currency to compensate one 
agent for a disadvantageous agreement. 
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order in which the conflicts have to be resolved. The solution of a conflict might make 
the solution of another conflict easier (or impossible); or if a conflict is settled, an- 
other conflict might vanish or a new one might arise. 

Nevertheless, in this paper we are not specifically focused on defining the set of 
interactions that agents have to face. Obviously such dependencies will depend on the 
organizational structure that agents are involved in. For instance, different interactions 
should be defined for a community of agents where every one works to get the same 
common goal, and for a community of agents attempting to maximize their own good. 
The study of agent dependencies is beyond the scope of this paper. Our concern here 
is about the kind of coordination algorithm -communication protocol and negotiation 
process- agents need in order to produce coordinated plans assuming a pre-specified 
set of plan relationships. 



3 Coordinating Agents 



3.1 A General Coordination Algorithm 

Coordination is the process of managing interdependencies between activities. This 
section focuses on the problem of handling these interdependencies in a domain- 
independent way. 

The general role of any coordination mechanism is to provide constraints for 
agents plans. Constraints are understood here as a modification process inside the 
plans structure, or as a process of making commitments. The knowledge state of a 
coordinating agent includes its own plan, its knowledge about the rest of the agents 
plans and the relationships it holds with them. For each agent g its knowledge state is 
a 3 -tuple defined as follows: 

Sg = < LPg , EPg , PRg> where 

.«APg denotes the set of leaf-actions intended by g s plan. 

^Pg denotes the set of leaf-actions intended by the rest of agents. 

.<4’Rg denotes the set of interactions g maintains with the rest of the agents. 

The input state is given by <Sgi,...,Sg„> where Vg EPg = 0 and the output state is 
obtained when PRg = 0. 

The coordination process reconciles plans evaluating the possible interactions 
among agents, hypothesizing the best solution for a coordinated plan, and starting a 
negotiation process to arrive at mutually-agreed ways of handling every existing rela- 
tionship. The coordination process then is a dynamic representation of the environ- 
ment where the structure of tasks goes from being a set of uncoordinated plans to be a 
set of coordinated plans. In other words, the coordination problem has as input a set of 
uncoordinated plans and as output a modified set of plans which are finally coordi- 
nated. 
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<Sgi,...,Sg„> OUTPUT 

VgPR»=0 <Sgl,...,Sgn> 

Vg PRg 0 



Fig. 1. The coordination algorithm 



Figure 1 shows us the steps agents must carry out to complete the whole process. The 
first step says that, after agents have developed their plans, they pass on the initial 
information they consider relevant. A plan may be refined or specified by a group of 
finer grained tasks which describe the agent s goal in more detaiP. When plans are 



^ Having actions at different levels of abstraction is useful to preserve agents autonomy. It 
would be desirable that agents exchange only the information they consider relevant. This is 
only possible if actions are planned and exchanged at different levels of generality. So the 
structure of a plan can be visualised as a direct graph with a general task as root node. The 
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refined agents would exchange just the most effective actions, as initial information, 
preserving the overall plan structure and the history of its refinement. 

Exchanging plans in the beginning of the process allows every agent to update its 
own knowledge base about the rest of the agents actions. Knowing one s own plan 
and the actions of the other agents plans they are able to detect plan relationships. If 
any agent detects a real or potential plan relationship it hypothesizes the best solution 
for it and starts a conversation with the agent involved in such a relationship. If the 
agreement is finally reached agents modify their respective plans according to the deal 
and they let the community know the new plan changes in order to establish whether 
or not the new agreement affects someone else. The coordination process ends when it 
is ensured that there are no more relationships in the community. 



3.2 The Negotiation Cycle 

Negotiation is a communication process between two or more agents in order to get an 
agreement. Negotiation varies depending on the kind of organizational structure 
agents are working in. For instance, negotiation could be to get a common goal (help 
or favour relationships), to resolve negative interactions (resource and time incom- 
patibilities), or to get mutual benefits (self-interested domains). Many negotiation 
mechanisms rely on techniques like Argumentation [16], Contract Net Protocols 
[19,20], Auction-Based Protocols [15], or techniques from economical paradigms 
such as Utility Theory and Decision Theory [17,21]. 

Negotiation is going to be understood here as reaching a commitment between 
agents through a structured message-passing where we specify: 

• Who the agents involved in a conversation are and who communicates to whom; 

• What the messages exchanged between negotiators are; and 

• How this process is conducted and when it takes place. 

In the beginning of the whole coordination process agents exchange their plans or the 
part of their plans they consider transferable. We will not consider lies in our ap- 
proach. Our honesty assumption excludes lies in our message-passing protocol. Nev- 
ertheless, some researchers have allowed agents to lie in the negotiation process as a 
way of knowing whether lies could be a useful resource in societies composed by 
autonomous self-interested agents [21]. If they detecf* plan relationships in such in- 
formation agents then start a negotiation process to resolve them. The protocol is 
intended to respect agents autonomy, to allow the flux of planned actions, and to 
provide the system with several options of getting an agreement. 

The negotiation cycle then can be understood as a simple state diagram as it is 
shown in figure 2. The model is general enough to keep open the choice of introduc- 



immediate successor nodes from the root node are the most abstract actions in the plan. The 
leaf nodes represent the most effective actions. 

The basic algorithm to detect an interaction is directly obtained from the definition of interac- 
tion [14,6,2]. In this way every agent in the community should be able to recognize the set of 
potential plan interactions the system is dealing with. 
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ing, in further studies, new ways of interaction. The diagram is composed of a set of 
states that agents are able to occupy, a set of messages which agents are able to send, 
and a set of transition rules which establishes the states in which agents are able to 
send or receive every message. 




Fig. 2. The Negotiation Cycle 



(A) The set of conversation states in which agents are able to take part (C- 

S tales): 

j^nitial: starting state. 

Negotiable: actions in this state are involved in a plan relationship. 

^proposed: state in which an agent proposes modifications (solutions) to the con- 
cerning action. 

Nnswered: state in which an agent answers a proposal. 

^committed: rtate in which agents get an agreement (proposal acceptance is con- 
firmed). 

Nn-solved: rtate in which there is no deal after retusing a proposal. 

^inal: final state in which an action is approved (or rejected) for execution. 

(B) The set of message types which agents are able to send (M-Types): 

Nction: an agent announces the details of the action involved in a plan relationship. 

^proposal: the required agent proposes a modification to such an action to the pro- 
motor agent. 

Nefusal: the required agent refuses to negotiate about such an action or plan relation- 
ship. 

Ncceptance/ rejection: possible reactions to the proposal. 

^confirmation (+/- ): positive or negative confirmation from the required agent. This 
message closes the initial negotiation or it opens a new re-negotiation phase. 
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i^re- negotiation-. The agent can start a re-negotiation phase due to any message re- 
jecting a proposal. 

^^execution/withdrawal: the agent announces that the action is going to be executed as 
previously agreed, or the agent announces why the negotiation processes is going to 
be dissolved. 

(C) The conversation rules which specify the state transitions made through 
the message types. The communicational behaviour of an agent is specified by a 5- 
tuple 

< estates, M-Types, receive, send. So > where So is the initial state and 

Instates are the previously defined ones. 

^^-Types are the previously defined ones. 

i^feceive: M-Types X C-States ^ C-States 

^-States ^ C-States x M-Types 

Every agent is able to open an initial negotiation phase. The negotiation cycle is 
opened by a message from an agent who has detected a real or potential plan relation- 
ship. He requires the attention of the agent he considers relevant for solving the inter- 
action sending a message to him announcing the details of the involved action (dura- 
tion, required resources, etc.). That would mean that both agents have already started a 
negotiation process. The agent receiving this message reasons about the best solution 
for him and, if so, replies with a proposal to the promoter agent. It can also announce 
that the negotiation process is going to be dissolved. Once the agent that sent the 
original message receives the agent s reply (if any) he can accept or reject the pro- 
posed plan-interaction solution. In the case that the proposal is finally accepted the 
required agent would confirm the agreement closing the negotiation cycle^. In case Ihe 
proposal is finally rejected the required agent would confirm that no agreement has 
been reached and he could open a new re-negotiation phase or he could announce that 
the negotiation process is going to be dissolved. 

In this protocol a commitment means that one agent bind itself to a potential deal 
while waiting for the other agent to either accept or reject its offer. If the other party 
accepts, both parties are bound to a contract. On the other hand, if the first offer has 
been rejected agents may exchange more information about their own plans - a new 
possible plan refinement- in order to see the best way of solving the conflict (re- 
negotiation proposal). 



4 An Example 

Lets take an example about two agents having plans for a regular working day in the 
same research institute. Agents plans are independent but they have to share the same 
space and the same set of physical resources so they have to coordinate in order to 
successfully carry out their planned actions. In figure 3 we show each agent s plan and 



^ Notice that such an agreement might not be reached. The negotiation protocol will allow 
agents to decide whether they want to take part in resolving the conflict or not. 
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the temporal intervals required for each action involved. The institute provides them 
with a set of common resources such as, for instance, a printer and a projector. We 
should also consider the common usage of a conference room used by the institute for 
every organised event. The so called room A is designated for such a purpose. 
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Fig. 3. Two coordinating agents 



At the moment of starting the coordination process the initial knowledge states are the 
following ones: 

Sa -1 = < LPa- 1 , EPa- 1 , PRa -1 > where 

LPa -1 = {talk-university, printing-slides, checking- e-mail, PhD-course- 
institutej 

EPa-1 = 0 
PRa-1 = 0 

Sprida “ < LPa -2 , EPa- 2 , PRa- 2 > where 

LPa -2 = {returning-books, printing-slides, seminar-talk-institute, going- 
doctor} 

EPa-2 = 0 
PRa-2 = 0 

The agents knowledge states are moving on as long as the coordination process 
evolves. Such a process depends on several factors related to the amount of refine- 
ments in each action and the particular negotiation protocol selected. In this example 
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we just show how this general coordination algorithm can be used to resolve every 
plan interaction. 

After exchanging mutual information the agents are able to detect every (potential) 
current interaction. Figure 4 shows a set of four interactions where the first interaction 
is understood as a favour relationship between agents; the second interaction is just a 
potential conflict induced by a consumable resource usage; and the last two interac- 
tions are real conflicts induced by some time/resources incompatibilities. 



9 10 11 12 1 



3456789 10 



temporal line 



' a.m.' 



p.m' 



Giving a talk at 
the university 




Returning some 
books to the 
university library 



Printing some slides 




Printing some slides 



Teaching a PhD course 
at the institute (room A) 




Giving a seminar 
talk at the 
institute (room A) 




Fig. 4. Set of Agents Interactions 



Agents knowledge states would then become updated as follows: 

Sa -1 = < LPa- 1 , EPa- 1 , PRa -1 > where 

LPa -1 = {giving-talk-university, printing-slides, checking-e-mail, PhD-course- 
institutej 
EPa-1 = LPa-2 

PRa -1 = {{talk- university , return-books p ,.2 , Interaction- 1), 

{printing-slides , printing-slides ^,.2 , Interaction-2), 
{PhD-course-projectorA-i , seminar-talk-projector a -2 , Interaction-3), 
{PhD-course-roomA /,,.1 , seminar-roomAp ,_2 , Interaction-4)} 

Sa -2 = < LPa- 2 , EPa- 2 , PRa- 2 > where 

LPa -2 = {returning-books, printing-slides, seminar-talk-institute, going-doctor} 
EPa-2 = LPa-1 

PRa -2 = { {return-books p ,_2 , talk- university , Interaction- 1), 

{printing-slides f ,_2 , printing-slides , Interaction-2), 

{seminar-project. f^_ 2 , PhD-course-projectorp^_x , interaction-3), 
{seminar-roomAp,_ 2 , PhD-course-roomAp,_i , Interaction-4)} 
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Once conflicts have been detected agents should start resolving them adjusting their 
knowledge states along the way. It is here where the negotiation protocol is applied. 

In order to resolve the first of our conflicts A-2 sends a message to A-1 proposing 
to it to carry and return its books because A-1 s journey is necessary. We assume the 
university is several miles away from the research institute where they go to work 
every day. In this way A-2 could delete such an action from its plan if A-1 finally 
accepts the request. After receiving the message A-1 evaluates the proposal positively 
because the favour is not going to cost any great additional effort®. Thus it sends an 
acceptance message to A-2. The solution for the favour interaction- 1 is finally con- 
cluded through a confirmation message to get such an agreement. A-2 will delete the 
returning books action from its plan and A- 1 will add the action to its own plan. Then 
both agents update their respective knowledge states as follows’: 

Sa -1 = < LPa -1 , EPa -1 , PRa -1 > where 

LPa -1 = {Talk-university, returning-A-2 s-books, printing-slides, checking- 
mail, PhD-course-institute} 

EPa-1 = LPa-2 

PRa -1 = {{printing-slides p,.\ , printing-slides p ,.2 , Interaction-2), 

{PhD-course-projectorA.i , seminar-talk-projectorA -2 , Interaction-3), 
{PhO-course-roomA^.i , seminar-roomAp^. 2 , Interaction-4)} 

Sa -2 = < LPa -2 , EPa- 2 , PRa- 2 > where 

LPa -2 = {printing-slides, seminar-talk-institute, going-doctor} 

EPa-2 = LPa-1 

PRa -2 = {{printing-slides f ^_2 , printing-slides , Interaction-2), 

{seminar-talk-projector /^_2 , PhD-course-projector/^_i , interaction-3), 
{seminar- talk-roomAp ^_2 , PhD-course-roomAp,_i , Interaction-4)} 

In the next potential conflict agents evaluate whether the printer toner is full enough to 
print all the agents printouts. Due to the printer being considered in this case a con- 
sumable resource, agents have to make sure that the printer is going to work -is going 
to have enough ink- during the interval covering both actions. Fortunately they find 
out that the toner has been recently replaced and there is no need to start a negotiation 
process. This means that the potential conflicting resource is not going to create any 
negative interaction. Then 

PRa -1 = {{PhD-course-projectorp^_x , seminar-talk-projector /^_2 , interaction-3), 
{PhD-course-roomAp^_x , seminar- talk-roomA /^_2 , Interaction-4)} 

PRa -2 = {{seminar-projector f^_ 2 , PhD-course-projector/^_i , Interaction-3), 

{seminar- talk-roomAp ^_2 , PhD-course-roomAp^.i , Interaction-4)} 



® Quantitative criteria are required in positive interactions in order to estimate benefits [14], 
[17]. Game-theoretic techniques are considered suitable tools for designing self-interested 
automated negotiation. 

’ After every plan modification agents should inform the other agents in the community in 
order to see whether the changes affect anyone else. This step is not necessary if the commu- 
nity is composed of two agents. Both of them already know about each other s changes. 
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The two remaining conflicts are closely related in the sense that the order in which 
they are handled could help to resolve them more efficiently. On the one hand we see 
that both actions are planned to be carried out at the same place in overlapping tempo- 
ral intervals, and on the other hand the same overhead projector is required in both 
actions. The overlapping time is not too long, and both agents know that if both ac- 
tions were not overlapped there would be no non-consumable resource-driven conflict. 
This is why one of the two agents sends a proposal to reduce both action intervals. For 
instance, A-2 could finish her talk 5 minutes earlier and A-1 could start 5 minutes later 
his PhD course. This proposal avoids the conflict caused by incompatible actions and 
removes the conflict related to the shared usage of the same overhead projector. Fi- 
nally, if the agreement is successfully reached agents plans would be coordinated so 
that PRa -1 = 0 and PRa -2 = 0- 



5 Concluding Remarks and Related Work 

Since the early 80s the process of decentralized coordination has been concerned with 
the particular task of avoiding harmful plan interactions in communities composed of 
autonomous agents [11]. Nevertheless the study of interaction algorithms for coordi- 
nating plans has been mainly related to the intended multi-agent system and its par- 
ticular organizational structure. For instance. Partial Global Planning (PGP) [9] is one 
of the most successful multi-agent coordination mechanisms to manage Distributed 
Problem Solving in cooperative domains such as Distributed Vehicle Monitoring 
Testbed domains. Nevertheless, PGP does not seem to be suitable enough to coordi- 
nate communities of agents with autonomous and independent goals. PGP assumes a 
common goal to achieve general and coherent results sharing the same set of evalua- 
tion criteria. PGP was generalised some years later [8] in order to provide cooperative 
systems with a set of flexible and domain-independent coordination mechanisms. 

On the other hand, game theoretic techniques provide tools to describe rational 
agency in terms of self-interested purposes [10]. The standard notions of Game The- 
ory are traditionally used in Distributed Artificial Intelligence to see the extent to 
which self-interested agents are able to cooperate via negotiation without any prefixed 
benevolence assumption. Agents agree to achieve the goal after a negotiation process 
based on probabilistic reasoning or any other economically rational criteria. 

Nowadays universal coordination algorithms are needed to work in different envi- 
ronments with the same general and suitable framework [3,4,7]. The general interac- 
tion algorithm we have shown in this paper is flexible enough to be used in coopera- 
tive domains as well as in domains where agents work independently for their own 
goals. The paper describes a mechanism to detect and control conflicting interactions 
induced by incompatible states or by incompatible resource usage, as well as positive 
interactions. The study of agent dependencies, however, is beyond the scope of this 
paper. Our concern is about how the community should behave in order to produce 
coordinated plans assuming a pre-specified set of plan relationships. Coordination is 
used then to reconcile plans evaluating interactions among agents, hypothesizing the 
best solution for a coordinated plan, and starting a negotiation process to resolve every 
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existing conflict. It is a dynamic representation of the environment where the structure 
of tasks goes from being a set of uncoordinated plans to be a set of coordinated plans. 



6 Future Work 

Every coordination mechanism is expected to be integrated somehow as part of the 
agents architecture or as part of the agents reasoning. In our approach the coordina- 
tion process is a priori independent of the kind of agency we want for our agents. 
Nevertheless it would be necessary to see the extent to which such an external coordi- 
nating approach affects the internal behaviour of every member in the community, and 
vice versa. In other words, it would be interesting to show how the coordinating mod- 
ule is related to each agent s knowledge base in terms, for instance, of mental attitudes 
about the world or about the rest of agents. 

An interesting domain where we shortly expect to apply the negotiation model we 
have presented here is the electronic commerce scenario. Agent-based e-commerce is 
about any kind of electronic commercial transaction -buying and selling products on 
the net- trying to compare prices and features of products from different vendors in 
order to get a good (preferably the best) transaction. Particularly interesting are virtual 
agent-based markets and methods for utilitarian coalition formation among rational 
information agents [12]. The kinds of interactions that agents should face in this envi- 
ronment are about how to determine the terms of the transaction such as prices, qual- 
ity, delivery, etc. Negotiation would finally take place at that stage. Depending on the 
particular market we are considering, the complexity of the negotiation process can 
vary. In some markets prices and other aspects of the transaction are often fixed leav- 
ing no room for negotiation. In other markets (e.g. stock, automobile, fine art, local 
markets, etc.) the negotiation of price or other aspects of the deal depends on the 
product and the merchant behaviour. The model we have presented here is intended 
for extension and application in the near future to manage agent-based e-commerce 
interactions. 
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Abstract. The problem of isomorph-free search is analysed in a group 
theoretic framework, some algorithms are given for building searches in 
a modular way, and the context in which they can be applied is charac- 
terised as generally as possible. These results are applied to the problem 
of building finite models of monadic first-order sorted logic. 



1 Introduction 

When searching for objects with specific properties in huge search spaces, it is 
often impossible to keep the full space within memory, and the searches have 
to rely on some technique for generating all possible candidates. The efficiency 
of the search then relies heavily on the precision with which those candidates 
are generated with respect to the expected property. But the search is necessary 
because the objects searched for are not to be constructed directly for lack of a 
dedicated program. The generating mechanism has to retain some generality. 

Of course, the objects produced this way can only be computer represen- 
tations of some possibly more abstract structure. Some structures have direct 
computer representation, like integers, or strings. Others however do not, in the 
sense that a unique representation may not easily be found. The typical ex- 
ample is the structure of graphs; though heavily used in computer science and 
especially in artificial intelligence, the computer representation of graphs is im- 
perfect in the sense that no polynomial algorithm is known to tell whether two 
representations correspond to the same graph. This problem is known as graph 
isomorphism (see [3], also for group related topics). 

This means that most search procedures are unable to cope with isomor- 
phisms, and generate many isomorphic representations of abstract objects. This 
is useless as far as the property searched for is invariant, i.e. stable on each iso- 
morphism class (but non-invariant properties are mathematically meaningless). 
This is dreadful when the isomorphism classes are huge. Some computational ef- 
fort can therefore be profitably spent on isomorph-free generating mechanisms. 

But how is this to be done? It is certainly not practical to embed the structure 
of interest into a fixed one, to generate for example all graphs of a given size if 
we are only interested in trees. Is there some general way to analyse a structure, 
and deduce from this analysis an isomorph-free search of the structure? The aim 
of this paper is to provide some ways of doing this. 
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In the next section we develop the general setting, in which isomorphism 
classes are given by an equivalence relation. In section 3 we specialise to classes 
given through operations of groups, which allows the use of group theoretic 
constructs and algorithms. Section 4 is devoted to the application of these results 
to the finite models of monadic logic. Conclusion and perspectives are reached 
at the end of the paper. 

2 The General Framework 

For a function f : A ^ B, a, partial function s \ A ^ A, o £ A and an equivalence 
relation = on B, we say that o, s is a =- enumeration of B through f iff 

J Va; G S, 3n e N I x = /(s”(o)) (completeness) 

\ Vn,n' G N, /(s”(o)) = /(s" (o)) ^ n = n' (frugality) 

When A = B and / is the identity, we say that o, s is a =-enumeration of A. 
We call 0 the initial element of the enumeration, s its successor function. 

The reason why s is partial is simply to make the last element of the enumer- 
ation I = s™(o) special by letting s{l) be undefined. Then all s^(o) for k > m are 
undefined, and the completeness and frugality conditions above are only mean- 
ingful for n, n' < m. This, of course, does not mean that an implementation of 
s should be non-terminating on 1. The undefinedness of s(l) should rather be 
signalled by returning a special value, or raising an exception, or any other way 
the programmer may find convenient, and is free to devise. 

A trivial example occurs when = has only one class on A\ then for any o G A, 
we have o, 0 as a =-enumeration of A. 

The reason why we need the set A and function / in the definition is simply 
that the elements of B may not convey sufficient information to compute their 
=-successors, and we may need to compute them through / from the information 
provided in A. Another application is to replace an enumeration of some struc- 
ture by the enumeration of another more convenient and isomorphic structure, 
as is made clear in the following simple theorem: 

Theorem 1. If o,s is a =-enumeration of A, « is an equivalence relation on 
B and f : A ^ B verifies: 

Vx, 2 / G A, f{x) « f{y) <^x = y and \/y G B,3x e A \ f{x) « y 
then o,s is a ^-enumeration of B through f. 

Proof. We first prove completeness: for y G B, there exists a x G A such that 
f{x) ^ y. But by completeness of the =-enumeration, we have 3n G N such that 
X = s”(o), and then y « f{x) ^ /(s”(o)). We now prove frugality: Vn, n' G N, 
suppose /(s”(o)) ^ /(s” (o)), then we have s”(o) = s” (o), and by frugality of 
the =-enumeration we get n = n'. 
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This theorem is not very surprising due to the strong property of /; it means 
that we have been able to fully characterise the equivalence classes of B as 
equivalence classes of A. It is not in itself helpful for decomposing a difficult 
search problem into simpler ones. The following theorem is more useful in this 
respect, since it is often the case that some invariant function like, e.g. the 
number of edges, of vertices, the degree of a graph, etc., can make the search 
easier if only one value of the invariant function is considered at a time. Given 
sets A and B with equivalence relations = and a function l : A ^ B is 
invariant if Vx, y G A, x = y ^ i{x) ^ i{y). 

Theorem 2. If o,b is a ^-enumeration of B through g : B' ^ B, = is an 
equivalence relation on A and l : A ^ B is invariant; ifWi £ B, we have a =- 
enumeration Oi,ai of Ai = {x G A \ l{x) ^ i\ through fi'.A'^^ Ai,- then we get 
a =-enumeration 6,s of A through h : A' x B' ^ A with: A' = UfeeN 
^ = (Og(o),o) and: 

V(y, j) G X B', h{{y,j)) = fg(j)iy) 

s{{y,j)) = {o-g(j){y),j) if defined, {Og;b(j)),b{j)) otherwise. 

Proof. It should be clear that any s”(0) is a tuple K( 3 )^° 9 U))^d) withj = 6'=(o), 
and conversely that any such tuple can be obtained exactly once as a s”( 0 ). 

We first prove completeness: Va; G A, let i = i{x), then 3/c e N | z ^ 
g{b^{o)). We have x G Ai = Ag(pk(^g;;, so let j = 6^(o), we have 3m g N | a; = 
fgU)( a^^-) (Og(j))), hence from the preceding remark we have 3rz G N | s”(0) = 
(a^j)(Os(i)),j)- We therefore have /z(s"( 6 <)) = fg(j){a 2 j;{ 0 g;j;)) ^ x. 

Now suppose that /z(s"( 6 *)) = /i(s” ( 6 *)), which means by the remark above 
that for some k,m,k',m' G N we have fi{af^{Oi)) = fi'{aff (Oi')) with i = 
g{b^{o)) and i' = g{b^ (o)). By the property of invariance of z we get: 

i^i{h{aT{Oi)))^i{fAa7\oe)))^i' 

(since /i(a("(Oi)) G Af) which implies that k = k' hy frugality of 0 , 6 , so that 
i = i' , and then that m = m' hy frugality of fi = ft'. By the remark above we 
get n = n'. 



3 Enumerations Modulo Group Orbits 

In the sequel we will have to consider many different relations on sets of rather 
complex objects, such as functions or some species of graphs, and relate them 
in some readable way. We will do this through groups and group operations. 
An operation of a group G on a set A is a function op : A x G ^ A such 
that Vx G A,V(J, 7T g G; we have op {op {x, a), it) = op{x,aTr). The reader should 
note that we have left implicit the name of the product in an; this is usually 
harmless since only one group product is linked to a symbol G. Similarly, there 
is usually only one operation linked to a symbol A, and the standard notation x'^ 
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for op{x, a) consequently leaves op implicit, x'^ does not have the same meaning 
whether x G A or x G B. The same holds for some other notations, such as 
Gx = {(J G G \ x'^ = a;}, the stabiliser of a; in G (a subgroup of G; this is noted 
Gx<G). 

Given a group G and its operation on A, the corresponding equivalence re- 
lation we will consider on A is the G-orbit relation x y 3a G G \ x'^ = y. 
Its equivalence classes are called G-orbits of A. In what follows, G-enumeration 
stands for ~cj-enumeration. 

In case there is no explicit operation on A and we still consider G- 
enumerations of A, this means that we consider the trivial operation of G on A, 
defined as x'^ = x, and that we refer to a =-enumeration. 

3.1 Enumeration of Standard Products 

One usual way of representing complex objects is to see them as tuples of simpler 
objects, like e.g. a labelled graph consists of a graph and a labelling function. 
Hence, given enumerations for A and B, we would like to provide an enumeration 
of H X B: intuition may suggest that the mere product of the enumerations of 
A and B is sufficient, but this is wrong. Consider as above o G A = B and G 
such that 0 , 0 is a G-enumeration of A (i.e. G is transitive on A), then (o, o), 0 is 
not a G-enumeration of Ax A, since obviously the G-orbit of (o, o) is limited to 
the diagonal of A x A. Even if A ^ B the coordinates may not be independent 
objects, and we have to suitably refine the equivalence relation on B, which is 
done by means of stabilisers. 

Theorem 3. Considering a group G with operations on A and B, and the stan- 
dard extension of these operations on Ax B (i.e. {x,yY = is a 

G-enumeration of A through f : A' ^ A and\/H<G, Oh, bn is a H -enumeration 
of B through g : B' ^ B, let = (o, 0 Gj(„)) and let s : A' x B' ^ A' x B' the 
partial function defined by: 

s{{x,y)) = {xYniy)) if defined, {x',0h') otherwise, 
where x' = a{x),H = Gf(^x)iH' = Gf(^x') 

then o^,s is a G-enumeration of Ax B through (f,g). 

Proof. First, it should be clear that every s"(o^) is of the form (a™ (o),6^(o^)) 
with H = Gy(a»"(o))> that m,k is uniquely determined by n, and reciprocally 
that every such tuple (as long as it is defined) is equal to some s”(o^). 

Let {x,y) G AxB.By completeness of o, a we have 3m G N, 3cr G G such that 
x'^ = /(a’”(o)). Let x' = /(a'"(o)) and H = Gx>. We apply the completeness of 
Oh, bn to y'^: 3k G N,3n G H \ y'^'^ = g{b^{oH)). We therefore have 

{x,yY^ = {x'Yyn = if,gK{a^{0),b>k{0H))) 

= (/,g)(s"(o^)) for some n G N 



which proves completeness since an G G. 
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Now suppose 3a G G \ (/, 5 )(s"(o^))'^ = (/, 5 )(s" (o^))> and that s”(o^) = 
(a""(o),6^(oH)) with H = G/(am(o)), and similarly s”'(o^) = (a™ (o), 6^, (o///)) 
with H' = By equating coordinates we get 

f{a^{o)Y = /(a™'(o)) and 9{b%{oH)Y = g{bUow))- 

From the first we get m = m' , thus H = H' and a G H . Hence from the second 
we get k = k' , and this proves n = n'. 

It should be noted that this theorem encompasses the case where the coor- 
dinates are independent, since then the operation of Gf(^x) on B is exactly the 
operation of G on H. The successor function for Ax B then exactly corresponds 
to the intuitive enumeration of the Cartesian product of the set of canonical 
elements obtained by the enumerations of each coordinate. The wise program- 
mer would however implement a special successor function for this independent 
product in order to avoid the useless computation of Gy (a,). 

3.2 Refinement to a Subgroup 

In the previous theorem we have used a family of enumerations of B, one for each 
H<G. We will now prove that it is possible to compute each of these from the 
enumeration relative to G, by use of double cosets. For any H, K subgroups of G 
and any a G G, we call double coset of cr the set HaK = {pai: \ p G H,tt G K}. 
It is well-known that for fixed H, K, the double cosets form a partition of G. In 
other words, the relation defined by Vct, cr' G G, a a' HaK = Ha'K 
is an equivalence relation. 

Theorem 4. If o,a is a G- enumeration of A through f : A' ^ A, H is a 
subgroup ofG, andVx G A', ax,dx is a enumeration ofG, letO = {o,ao), 

let g : A' X G ^ A the function defined by g{{x, a)) = f{xY, and 

s{{x,a)) = {x,dx{o')) if defined, (a(x), aa(x)} otherwise; 

then 6,s is a H -enumeration of A through g. 

Proof. As above, we have a 1-1 correspondence between the n’s and the to, k's 
such that s”(0) = {x, d^Oix)) with x = a™(o). We now prove completeness. 

Vx e A, 3m e N, 3cr G G \ x'^ = /(a™(o)) by completeness of o,a. Let 
x' = a™(o) G A', by completeness of ax',dx' applied to a~^ , we have 3k G N 
such that d^,{ax') G Gf(^x')<^~^H, thus 3p G Gf;x')Y^ € H \ d^,{ax') = pa~^n. 
By the remark above we have 3n G N | s”(0) = (x', pcr“^7r), thus g{s^{9)) = 
f{x'Y ^ which proves completeness. 

Vn,n' G N, suppose 37 t G iL | ^(s” {9)Y = g{s"'{9)). Let to, fc such that 
s”(0) = {x,a) with X = a™(o),cr = d^Oix), and m',k' such that s”'(6*) = 
{x',a') with x' = a'"'(o),cr' = d^',{ax'). We then have f{x'Y'^ = fYYi i-e. 
f{x'Y = /(a^)- Since cr'-Trcr”^ G G, by frugality of o,a we get to = to', and 
then X = x' . Hence a’'xa~^ G Gf^x), and a' G Gf(^x)<^H. By frugality of ax,dx 
this implies k = k', and we get n = n' . 
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3.3 Permuting Coordinates 

The standard operation on a Cartesian product is not always convenient, and the 
fair representation of a complex structure as a tuple may require the possibility 
to allow some permutation of coordinates. For example, forests of k trees of size n 
can be represented as /c-tuples of trees, but the ordering of coordinates imposed 
by the Cartesian product is too restrictive to encompass forest isomorphisms. 

More generally, given a group G operating on A and i? < S„ (where S„ is 
the symmetric group on {l..n}), we will consider (in some special cases) the 
independent operation of G on the coordinates of mixed with the operation 
of H on the coordinate indices. This operation is actually linked with the wreath 
product GlH, but since defining GlH and then its operation on (which is not 
the standard extension of an operation on A) would be tedious, we only define the 
equivalence relation on induced by them: ( a ; i , . . . , Xn) ^g>h {yi, ■ ■ ■ , Vn) ^ 
3(Ti, . . . ,(T„ G G,37t G Ff I Vi G {l..n}, x1' = 

We have a trivial case with H = I. The operation oi G\I on Gl” is however 
similar to the standard operation of G x . . . x G on A^ (the Cartesian product of 
groups is naturally defined with the componentwise product, e.g. (cr, , w') = 
(crcr', tttt')), for which theorem 3 provides an enumeration. Of course, the remark 
following the proof of theorem 3 applies, since in the Cartesian product of groups 
coordinates are independent. 

This trivial case can be generalised in the following way: suppose there exists 
a partition Xi , . . . , Xk of {l..n} and K \, . . . , where each Ki is a permutation 
group on Xi, such that H is isomorphic to iVi x . . . x Kk (this can be easily 
tested on a generator set of H) . Then the operation oi G I H on A” is similar 
to the operation of (G I K\) x ... x {G I Kk) on A^^^^ x ... x and we can 

compose the G I iV^-enumerations through theorem 3. 

Of course, not every H can be decomposed in this way, and it seems very 
difficult to devise a general G I iV-enumeration of Below, we will solve only 
two special cases: the symmetric product with = S„ and the cyclic product 
with H = C„, which is the group generated by the permutation (1 2 . . . n). 



3.4 The Symmetric Product 

Theorem 5. If o,a is a G-enumeration of A through f : A ^ A, and 
n G N, let g : T'" ^ T” be the standard componentwise extension of f, 
0 ^ = (o,...,o) G A'^ and s{{x\, . . . ,Xn)) = {x\, . . . ,Xk-i,x'f., . . . ,x'ff) where 
k = maxji > 1 | a{xi) is defined} and x} = a{xk). Then o^,s is a G I S„- 
enumeration of A^ through g. 

Proof. It is clear than Vm G N, s™ (o^) is of the form {a'^^ (o), . . . , a’”"(o)) with 
TTii TTin, and conversely that every such tuple (as long as it is defined) is 

obtained as a s"*(o^). 

y{xi, . . . ,Xn) G T”, by completeness of o,a we have 3mi,...,m„ G N, 
3 (Ti,...,(j„ G G I yi,xf' = /(a’”*(o)). By sorting the list we get 
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a permutation tt S S„ such that mi^ < • • • < , and then by the pre- 
vious remark a m such that s™{o^) = (o), . . . , a™"’' (o)), and therefore 

(a;i,...,a:„) ~GiS„ ff(s™(o^)). 

Now suppose (/(s™(o^)) ~G)S„ (o^))- For some nii < ■■■ < m„ 

and m[ < ••• < we then have (/(a'"^(o)), . . . , /(a’”"(o))) ~g;S„ 

(/(o™! (o)), . . . , /(a™"(o))), which means € G, Btt S S„ such that 

Vi, /(a'"*(o))'^* = /(a’”*’' (o)), and by frugality of o,a we get Wi = m',r. There- 
fore we have < • • • < while m'l < • • • < which means that 
to' = = rrii and by the remark above we get m = m' . 

3.5 The Cyclic Product 

The case H = C„ is much more difficult than the previous one, even though we 
will come up with a surprisingly simple algorithm for the successor function. In 
order to keep proofs readable (well, sort of...) we need to develop this algorithm 
and the corresponding proofs on strings on a finite alphabet V, on which we 
suppose an =-enumeration o, a. The strict linear ordering ^ of P is defined as 
a*(o) ^ a^(o) z < j. The link with our general framework will be through the 
implicit isomorphism between A" and the strings of length n. 

For X G y", we note \x\ = n the length of x. Then Vz G {l..n}, Xi denotes 
the z**' letter in x, so that x = xi...Xn- The empty string is noted e. For 
j G {l..n}, Xi^j denotes Xi . . .Xj if z < j, and e otherwise, xi^i is a prefix of x. 
For y G P”, we note x r\ y the greatest (w.r.t. length) common prefix of x and 
y. The strict lexicographic ordering is defined^ as x y \x U y\ < |a;| and 
X\+\xny\ ^ yi+\xny\- We give without proofs the three following properties: 

yx,y G y",VTO G N, a: T z/ a;™ T y"' (1) 

yx,x G y",Vy,y' G P™, xy T x'y' ^ x Q x' (2) 

Wx,x' G y”,Vy,y' G V"", x \Z x' ^ xy C. x'y' and yx C yx' (3) 

A less obvious property that we will need is: 

Lemma 1. Vx, x' , y, y' G P”, if x Qy,y \Z y' and |a:na:'| > |yriy'| then x' C y' . 

Proof, let k = |y n y'| -|- 1, we have yi,fc C y( ,,, but = xi^k E yi.fc by (2), 
and therefore x) C y( and x' C y' by (3). 

In the reasoning below we will need a rather unusual operator on strings: 
for x G and zz G N, 1?" is the string of length n defined by Vz, (lE”)* = 
a^(i-imodfc)-i-i- Hence x is simply repeated as much as necessary to get a string 
of length n. 

For x G y”, let k{x) = max{z > 1 | a{xi) is defined}, which may obviously 
be undefined (in which case x is maximal w.r.t c). This is the index at which x 
should be increased in order to get its successor in the lexicographic order. Now, 

Yes, we will only compare strings of the same length 



1 
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let iy{x) = xi^k-ia(xk) , where k = k{x). It is clear that x C i^{x), and that the 
smaller k is, the bigger the interval between x and ty^x) is likely to be. 

We say that x has the smallest prefix property, or s.p.p. in short, iff Vi, j > 1, 
if i + j < |a;| + 1 then xij C Xi^i+j-i. This property is preserved by 

Lemma 2. if x G V'^ has the s.p.p. then so does v{x) if defined. 

Proof, let k = k(x), z = iy(x),i,j > 1 | i + j < + and r = (z — 1 mod fc) + 1; 

we have = Zr,r+j-i by definition of ly. Hence we have to prove that 

zij E Zr^r+j-i- If r = 1 this is obvious, so suppose r > I and consider two cases. 

If r + j — I < fc then = Xr,r+j-i 3 xij since x has the s.p.p. But 

here j < k, thus zij = xij. 

Ifr + j — I > fc then Xr,r+j-i C Zr^r+j-i, and |a;r,r+i-i FI Zr,r+i-i| = k — r< 
j — 1, and we also have fc — r < fc — 1, so that \x\^j □ | = min(j, fc — 1) > 

\xr,r+j-i n Zr,r+i-i|- We still have x\^j E Xr,r+j-u hence by lemma 1 we get 

Z\,j GL Zr,T+j-l- 

In order to turn v into a successor function among the strings that have the 
s.p.p. we still have to prove that they will all be reached. This is where the use 
of a linear ordering on strings shows its convenience. 

Lemma 3. If x y d v{x) then y does not have the s.p.p. 

Proof, let 2 = v{x),n = |a;| and fc = k{x). By definition of k we have x\^k d yi,k, 
and from y Q z hy (2) we get yi^k E zi^k- Hence by definition of v we have 
2/1, fe = zi^k- When we let j = \y d z\ + 1, we obviously have yj -< Zj. The 
division of j — 1 by fc yields j — \ = qk + r, and we have yqk+i,j d Zqk+ij. But 
2: = zTff"’ = yyk"’, thus Zqk+l,j = yi,r+i- 

We say that x G E” is minimal iff Vz < n, x d Xi^nXi^i-i. This really 
means that x is the smallest in the set of strings that can be obtained from x by 
shifting round its letters, i.e. in its C„-orbit. It is obvious that all minimal strings 
have the s.p.p.: let j such that i + j < n + 1; then from x E Xi^nXi,i-i we get 
xij E Xi^i+j-i by (2). The reverse however is not true, and among the powers of 
v from 0 " (the smallest minimal string of length n), we still have to test for the 
minimal ones. From the definition of minimality this seems to require the actual 
use of the ordering relation ^ on V. But remember that we have not defined any 
orders on the enumerated sets, and that we are only allowed successor functions 
and initial elements. Fortunately, this test can be replaced by a much simpler 
and more efficient one. 

Lemma 4. if x has the s.p.p. then v{x) is minimal iff |a;|modK(a;) = 0. 

Proof, let n = |a;|,fc = k{x),z = iy{x). We first prove the if part; let z < zz, if 
z = 1 then 2 E Zi^nZi^i-i is trivial, so we take z > 1. We have k < n and n is a 
multiple of fc. First suppose k = n; then zi^n-i+i = a^i,n-i+i E since x has 
s.p.p.; we have Xi^n d Zi^n by definition of v, and therefore z d by (3). 
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Now supposing n = mk with m > 2, we have z = If we let r = 

i— Imodfc, we have = Zr+i,n^i,r = (-^r+i.r+fc)™ (since we have r + k < 

2k < n). By lemma 2, z has the s.p.p. so that Zi^fc T Zr+i,r+ki thus by (1) we 
get Z C Zj,„Zy^_i. 

We now prove the converse of the only if part, i.e. the division of n by /c 
now yields n = qk + r with 0 < r < k. Let i = qk + 1, we have zy„ = Zi_r by 
definition of v. We also have zi_fc_r = xi^k-r E Xr+i,k since x has s.p.p. and 
Xr+i,k C Zr+i,fc by definition of v. Therefore, Zi^nZi,k-r = zi^rZi,k-r C z\^rZr+i,k 
by (3), thus Zi,„ziy_i C z still by (3), which proves that z is not minimal. 

We may now define a successor function s for minimal strings, in a recursive 
way: 

s{x) = if |a:| mod «:(a;) = 0 then iz{x) else s(j^(a;)). 



Theorem 6. {s’”(o”) | m G N} is the set of minimal strings ofV^. 

Proof. If a: G is minimal, then it has the s.p.p., and so do the ly^(x) by lemma 
2, and they are the only elements greater than x having the s.p.p. by lemma 3. 
Hence by lemma 4 s(a:) is minimal and no y is minimal \l x \Z y \Z s(a;). Let I 
be the last element in the enumeration o, a; since o" and are minimal, then 
3m G N I s™(o") = There is no element of greater than and it is 
clear that s(^") is undefined, hence all minimal elements of are in the set 
{s™(o") I m G N}, and only them. 

Termination of s can not be questioned from what precedes, but we may still 
raise some suspicion about the complexity of the number of recursive calls, so 
let us relieve them at once. 

Lemma 5. if x has s.p.p. and k(x) < \x\ then k{v{x)) > k{x) 

Proof, let k = k{x) and z = v{x), since x has the s.p.p. we have xi :< Xk, and by 
definition of k we know that a{xk) is defined. Since k < |a;| we have Zk+i = xi, 
therefore a{zk+i) is defined, and k ( z ) > A: + 1. 

Hence the number of recursive calls is bounded by |a;|. It is easy to see that the 
complexity of s is 0{vfk' + nk) where n is the length of the input string x (each 
letter counted as 1), k the maximal time for computing a(xi) if defined, and k' the 
time taken by a on the last element 1. k and k' are independent of n. This bound 
can be reached; the reader may check that on a; = of length 2n+l prime, 

which is clearly minimal, the computation of s(a;) = = o^"a(o)”“*'^ 

performs n + 1 computations of a(o) and computations of a{l). 

Theorem 7. If o,a is a G-enumeration of A through f : A' ^ A, and n G N, 
let g be the componentwise extension of f, o" = (o, . . . ,o) G H'”, and s defined 
as above, then o”,s is a Gl Cn~ enumeration of A'^ through g. 
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Proof. Every defined s"‘(o”) is obviously of the form (a™"^(o), . . . , a"‘"(o)), and 
can be considered as a string on V = {a^(o) | k G N}, so that the framework 
used above applies. 

Letting x G A”, we have 3 toi, . . . , G N, 3cti , . . . ,an such that Vi, xP = 
/(a’”*(o)). Let 7T G C„, and m- = rui^ such that a™i(o) . . . a’”’*(o) is mini- 
mal (i.e. the smallest w.r.t C of the strings that can be obtained this way). 
By theorem 6, 3fc G N | s^(o") = a"‘i(o) . . . a"‘"(o), hence g(s^(o")) = 
(/(a™i(o)) . . . /(a™"(o))) ~G)C„ X, which proves completeness. 

Suppose (/(s™(o”)) ~G(C„ g{s"^ (o"))> which by the remark above translates 
to (/(a""i(o)),...,/(a™"(o))) ~G(C„ (/(a”"'i (o)), . . . , /(a'""^(o))), and then to 
3 (Ti,...,(j„ G G, 37t G C„ such that Vi, /(o'"* (o))'^* = /(a’”*’' (o)). By frugality 
of 0,0 we get Vi, Wi = m'^. Hence s"*(o”) and s'" (o”) are both in the same 
C„-orbit, and since they are minimal by theorem 6, they must be equal. Since 
TO < to' s™(o") C s’" (o"), we must have to = to'. 

4 Application to Finite Monadic Sorted Algebras 

Given a finite set S of sorts, Vs, t G S, we will consider the monadic types s 
(for constants of type s), s ^ o (for predicates) and s ^ t (for functions); 
a monadic signature 27 is a tuple of monadic types. A finite S-algebra A is 
given by a family of finite carrier sets (As)se 5 such that Vs,t G S, As ^ ^ and 
s ^ t => As Ci At = ^, and by a tuple A such that Vi, if 27^ = s then Ai G As, if 
27j = s o then At C As, and if 27j = s ^ t then At is a function from As to 

A. 

Two 27-algebras A and B are isomorphic iff there is a function a such that 
Vs G 5, cr is bijective from As to Bg, and Vi, if 27^ = s then a{Ai) = Bi, if 
27j = s — > o then a{Ai) = {(j{x) \ x G At} = Bi, and if 27^ = s ^ t then 
o’(A) = {{a{x),a{y)) \ {x,y) G AJ = Bi. 

Since the set of finite 27-algebras is infinite, we will only consider the isomor- 
phism relation within the finite set 21 of 27-algebras for a fixed family of carrier 
sets (A)se5- Then the cr above are permutations of l+J^g^ A> and more pre- 
cisely are the elements of the product Q of the Sym As. Then the formulas for 
a(Ai) above can easily be proved to define operations of Q on the correspond- 
ing sets (depending on 27^) and can therefore be noted Af . The sets operated 
upon will be noted: Cs for the set of constants of type s, i.e. Cs = As, Vs the 
power set of As, and IF* = A^‘ . Remark that if At is a function /, we have 
^x,f^{x^) = f{xy. 

Then 21 is a Cartesian product of sets Cg,Vs, IF*, and VA, ,8 G 21, A is isomor- 
phic to 8 iff 3(7 G ^ I Vi, Af = Bi, i.e. A'^ = B under the standard extension 
of the previous operations to tuples. Therefore, theorems 3 and 4 provide a 
^-enumeration of 21 from ^-enumerations of the sets Cg, A, IF*. 

Finding a ^-enumeration of Cs is trivial, and any x G Cs may serve as initial 
element since its ^-orbit is Cs, and then x, 0 is such an enumeration. The case of 
predicates is scarcely less trivial, since any two sets of same cardinality can be put 
in 1-1 correspondence; let n = |As|, o,a be any =-enumeration of N„ = {0..n}, 
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and a function f ■ Nn ^ Vs such that Vi G N„, \f{i)\ = i, then by theorem 1, 
0 , a is a ^-enumeration of Vs through /. 



4.1 Enumeration of with s ^ t 

For any two k,n G N such that 0 < k < n, a, k-partition of n is a tuple p G 
such that 0 < Pi <■■■< Pk and Pi ~ note Partfc n the set of 

/c-partitions of n. For example, Parts 6 = {(1, 1, 4), (1, 2, 3), (2, 2, 2)}. 

With any / G IF* we associate the cardinality t(/) of its image f{As) ^ At- 
We have 1 < t(/) < minn, n*, where n is the cardinality of As and n' the 
cardinality of At, and it is easy to see that Vct G G, i-e. that r 

is invariant. Since it is trivial to provide a =-enumeration of integers from 1 to 
minn, n' , in order to obtain a ^-enumeration of IF* we need only, by theorem 2, 
provide a ^-enumeration of = {/ G IF* | r(/) = fc} for 1 < A: < minn, n'. 

For any f G Fk and any y G f{As), the cardinality c/(y) of f~^{y) is a 
non-zero integer, and we have X)y 6 /(. 4 „) c/(v) = so that by sorting the c/(y)’s 
we may associate with / a partition p(/) G Partfc n. But Vct G G, we have 
= f~^{y)'^, so that Cf<r{y'^) = Cf{y), and the integers obtained by 
Cfa and Cf are the same even though they are not obtained in the same order, 
which is removed by the sorting: p(/'^) = p(/)- 

The converse also holds; if p(/) = p((/), then there is a tt G Sym At such that 
Vy G At, Cf{y) = Cgiy^), and then 3(jy G Sym \ = VHv’')- 

Let a = € G, then Vx G As, let y = f{x), then G g~^{y'^), and 

then g{x'^^) = y'^ = f{xY- Since the f~^{y) and At are disjoint, this translates 
to g{x'^) = f{xY, which means that g = . 

Hence by taking a function q : Partfc n ^ Fk inverse to p, we are in a position 
to apply theorem 1, so that any =-enumeration of Partfc n, which need not be 
described here, is a ^-enumeration of Fk through q. 



4.2 Enumeration of 

The analysis above fails for 1F| since the domain and image of these functions 
are no longer disjoint, and therefore cannot be considered independently. For 
f G Fg C X Vis, we can see f as a directed graph on Vis, though a special one 
since each vertex has exactly one edge coming out of it. The structure of these 
function graphs is easy to analyse. 

Let xq G vis, define xt = P{xo). Since Vis is finite, 3i < j \ f{xj) = Xi, i.e. 
there is a cycle in /. Let f' be the graph obtained from / by removing the edge 
from Xj to xf, this operation does not disconnect any vertex from xq, but the 
connex component of xq in /' has one more vertex than edges, and is therefore 
a tree. Hence the connex component of xg in / consists of trees connected to the 
cycle Xi, ... ,Xj. 

Let n be the cardinality of Vis. The number of connex components of / 
may vary from 1 to n (for the identity function), and is clearly an invariant, 
so that theorem 2 applies, and we are reduced to enumerating functions with 
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a fixed number k of connex components. If ni, . . . ,Uk are their sizes, we have 
that with each / we may associate an element of Partfc n, and 
this is an invariant. Then we are once again able to restrict the enumeration to 
function graphs corresponding to a fixed partition p G Partfc n. 

We call Vi the set of the connected function graphs of size i. Consider the case 
where k = 2 and pi ^ P 2 - Then we have to find a ^-enumeration of Vp^ x Vpj , and 
if,g) is isomorphic to {f,g') iff 3cr G | (f^,g'^) = {f',g'), (since / and g are 
disjoint), hence we may apply theorem 3, taking into account the independence 
of coordinates. If however pi = P 2 , then the coordinates may be swapped, and 
we are in a position to perform a symmetric product of according to 

theorem 5. This is easily generalised to any k, by grouping the identical PiS. 
Our problem is therefore reduced to finding a tj-enumeration of Vi . 

A trivial invariant of elements of Vi is the length c of their cycle. Another 
invariant is the partition of the sizes of the trees grafted to the cycles, although 
this time the order in which these sizes are distributed is relevant. For p G Parte i 
it is sufficient to consider all possible permutations of p’s coordinates that are 
minimal (w.r.t. lexicographic ordering) in their Cc-orbit. When the tuples p thus 
obtained have a period d < c, i.e. d is the smallest integer such that pP 2 ...c) _ 
then we are in position to perform a d-fold cyclic product of ^-tuple of trees. All 
that is needed after this is the enumeration of trees of a given size, which is very 
similar to the enumeration of connex components, though in a recursive way. 



5 Conclusion and Future Work 

This analysis has led to the implementation in OCAML-2 of a system for isomorph- 
free generation of these monadic algebras. The system, called BiGFooT, cannot 
be presented here for lack of space. Let us mention one application though: by 
considering two sorts e and v and considering two functions from e to v, we 
can generate the directed multigraphs with fixed number of edges and vertices. 
Labels can be added by way of predicates, on v and e as well. 

Many things remain to be done in this line of work. It is first necessary to 
extend the work done in section 3 at least to dyadic logic. Dyadic functions and 
relations are as complex as graphs (see [1]), and we will certainly need some work 
in the line of [2] . We would also like to extend our theoretical framework to show 
how automorphism groups can be efficiently computed directly by the successor 
functions, as it is done in BiGFooT. And this system has to be completed: pruning 
techniques in the line of [4] should be included, etc. 
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